2025-12-04T08:56:27.9303915Z Current runner version: '2.330.0' 2025-12-04T08:56:27.9311045Z Runner name: 'i-035b9d8fd6b020edf' 2025-12-04T08:56:27.9312026Z Runner group name: 'Default' 2025-12-04T08:56:27.9313183Z Machine name: 'ip-10-1-59-14' 2025-12-04T08:56:27.9316219Z ##[group]GITHUB_TOKEN Permissions 2025-12-04T08:56:27.9318672Z Contents: read 2025-12-04T08:56:27.9319400Z Metadata: read 2025-12-04T08:56:27.9319949Z ##[endgroup] 2025-12-04T08:56:27.9322915Z Secret source: Actions 2025-12-04T08:56:27.9323839Z Prepare workflow directory 2025-12-04T08:56:27.9885370Z Prepare all required actions 2025-12-04T08:56:27.9930335Z Getting action download info 2025-12-04T08:56:28.3751928Z Download action repository 'pytorch/test-infra@main' (SHA:39aa74d619174326f4e2fb0e216151c2f29d9ffd) 2025-12-04T08:56:30.7325129Z Download action repository 'pytorch/pytorch@main' (SHA:eabb7ad2128580ef674446027b95bcf4e21e8df3) 2025-12-04T08:56:46.9975104Z Download action repository 'actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065' (SHA:a26af69be951a213d495a4c3e4e4022e16d87065) 2025-12-04T08:56:47.3103079Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722) 2025-12-04T08:56:47.5363353Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076) 2025-12-04T08:56:47.7522408Z Download action repository 'seemethere/download-artifact-s3@1da556a7aa0a088e3153970611f6c432d58e80e6' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-12-04T08:56:48.0184644Z Download action repository 'seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-12-04T08:56:48.3114650Z Getting action download info 2025-12-04T08:56:48.4457855Z Download action repository 'actions/checkout@v4' (SHA:34e114876b0b11c390a56381ad16ebd13914f8d5) 2025-12-04T08:56:48.7399006Z Getting action download info 2025-12-04T08:56:48.8632110Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e) 2025-12-04T08:56:49.0727503Z Getting action download info 2025-12-04T08:56:49.1864606Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482) 2025-12-04T08:56:49.3727313Z Getting action download info 2025-12-04T08:56:49.5487425Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/heads/main (ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32) 2025-12-04T08:56:49.5491427Z ##[group] Inputs 2025-12-04T08:56:49.5491821Z build-environment: linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T08:56:49.5503707Z test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T08:56:49.5515929Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:56:49.5516826Z sync-tag: 2025-12-04T08:56:49.5517659Z timeout-minutes: 360 2025-12-04T08:56:49.5517920Z use-gha: 2025-12-04T08:56:49.5518159Z dashboard-tag: 2025-12-04T08:56:49.5518428Z s3-bucket: gha-artifacts 2025-12-04T08:56:49.5518709Z aws-role-to-assume: 2025-12-04T08:56:49.5519311Z disable-monitor: false 2025-12-04T08:56:49.5519628Z monitor-log-interval: 5 2025-12-04T08:56:49.5519939Z monitor-data-collect-interval: 1 2025-12-04T08:56:49.5520255Z ##[endgroup] 2025-12-04T08:56:49.5521052Z Complete job name: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T08:56:49.6230751Z A job started hook has been configured by the self-hosted runner administrator 2025-12-04T08:56:49.6338801Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh' 2025-12-04T08:56:49.6348723Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:56:49.6349515Z ##[endgroup] 2025-12-04T08:56:51.2147810Z Runner Type: lf.linux.g4dn.12xlarge.nvidia.gpu 2025-12-04T08:56:51.2148439Z Instance Type: g4dn.12xlarge 2025-12-04T08:56:51.2148856Z AMI Name: unknown 2025-12-04T08:56:51.2178343Z AMI ID: ami-08982f1c5bf93d976 2025-12-04T08:56:56.7030450Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main 2025-12-04T08:56:56.7030967Z with: 2025-12-04T08:56:56.7031625Z github-secret: *** 2025-12-04T08:56:56.7032479Z instructions: All testing is done inside the container, to start an interactive session run: docker exec -it $(docker container ps --format '{{.ID}}') bash 2025-12-04T08:56:56.7033431Z activate-with-label: false 2025-12-04T08:56:56.7033727Z label: with-ssh 2025-12-04T08:56:56.7034041Z remove-existing-keys: true 2025-12-04T08:56:56.7034335Z fail-silently: true 2025-12-04T08:56:56.7034587Z env: 2025-12-04T08:56:56.7034812Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:56:56.7035092Z ##[endgroup] 2025-12-04T08:56:56.8257517Z Please see https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions for more info. 2025-12-04T08:56:56.8258911Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys 2025-12-04T08:56:56.8428723Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main 2025-12-04T08:56:56.8429414Z with: 2025-12-04T08:56:56.8429672Z no-sudo: true 2025-12-04T08:56:56.8429932Z submodules: recursive 2025-12-04T08:56:56.8430244Z fetch-depth: 0 2025-12-04T08:56:56.8430509Z env: 2025-12-04T08:56:56.8430744Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:56:56.8431063Z ##[endgroup] 2025-12-04T08:56:56.8508434Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T08:56:56.8509697Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T08:56:56.8518769Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:56:56.8519172Z env: 2025-12-04T08:56:56.8519414Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:56:56.8519727Z ##[endgroup] 2025-12-04T08:56:56.8607436Z ##[group]Run # Use all available CPUs for fetching 2025-12-04T08:56:56.8608033Z # Use all available CPUs for fetching 2025-12-04T08:56:56.8608548Z cd "${GITHUB_WORKSPACE}" 2025-12-04T08:56:56.8609006Z git config --global fetch.parallel 0 2025-12-04T08:56:56.8609554Z git config --global submodule.fetchJobs 0 2025-12-04T08:56:56.8610037Z  2025-12-04T08:56:56.8610515Z # Clean workspace. The default checkout action should also do this, but 2025-12-04T08:56:56.8611203Z # do it here as well just in case 2025-12-04T08:56:56.8611615Z if [[ -d .git ]]; then 2025-12-04T08:56:56.8612020Z  if [ -z "${NO_SUDO}" ]; then 2025-12-04T08:56:56.8612524Z  sudo git clean -ffdx 2025-12-04T08:56:56.8612892Z  else 2025-12-04T08:56:56.8613234Z  git clean -ffdx 2025-12-04T08:56:56.8613683Z  fi 2025-12-04T08:56:56.8614020Z fi 2025-12-04T08:56:56.8620512Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:56:56.8621322Z env: 2025-12-04T08:56:56.8621831Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:56:56.8622243Z NO_SUDO: true 2025-12-04T08:56:56.8622627Z ##[endgroup] 2025-12-04T08:56:56.8768037Z ##[group]Run actions/checkout@v4 2025-12-04T08:56:56.8768353Z with: 2025-12-04T08:56:56.8768625Z ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:56:56.8768966Z fetch-depth: 0 2025-12-04T08:56:56.8769221Z submodules: recursive 2025-12-04T08:56:56.8769492Z show-progress: false 2025-12-04T08:56:56.8769757Z repository: pytorch/pytorch 2025-12-04T08:56:56.8770228Z token: *** 2025-12-04T08:56:56.8770464Z ssh-strict: true 2025-12-04T08:56:56.8770710Z ssh-user: git 2025-12-04T08:56:56.8770951Z persist-credentials: true 2025-12-04T08:56:56.8771239Z clean: true 2025-12-04T08:56:56.8771508Z sparse-checkout-cone-mode: true 2025-12-04T08:56:56.8771812Z fetch-tags: false 2025-12-04T08:56:56.8772062Z lfs: false 2025-12-04T08:56:56.8772304Z set-safe-directory: true 2025-12-04T08:56:56.8772574Z env: 2025-12-04T08:56:56.8772801Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:56:56.8773086Z ##[endgroup] 2025-12-04T08:56:56.9930959Z Syncing repository: pytorch/pytorch 2025-12-04T08:56:56.9932474Z ##[group]Getting Git version info 2025-12-04T08:56:56.9933155Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-12-04T08:56:56.9934017Z [command]/usr/bin/git version 2025-12-04T08:56:56.9934320Z git version 2.50.1 2025-12-04T08:56:56.9941991Z ##[endgroup] 2025-12-04T08:56:56.9953294Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/ce61325c-9bbf-4083-9d68-528b3fba0d16/.gitconfig' 2025-12-04T08:56:56.9973295Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/ce61325c-9bbf-4083-9d68-528b3fba0d16' before making global git config changes 2025-12-04T08:56:56.9976247Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T08:56:56.9981781Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T08:56:57.0015717Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2025-12-04T08:56:57.0019456Z ##[group]Initializing the repository 2025-12-04T08:56:57.0023827Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T08:56:57.0055628Z hint: Using 'master' as the name for the initial branch. This default branch name 2025-12-04T08:56:57.0056538Z hint: is subject to change. To configure the initial branch name to use in all 2025-12-04T08:56:57.0057221Z hint: of your new repositories, which will suppress this warning, call: 2025-12-04T08:56:57.0057913Z hint: 2025-12-04T08:56:57.0058279Z hint: git config --global init.defaultBranch 2025-12-04T08:56:57.0058699Z hint: 2025-12-04T08:56:57.0059111Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2025-12-04T08:56:57.0059793Z hint: 'development'. The just-created branch can be renamed via this command: 2025-12-04T08:56:57.0060322Z hint: 2025-12-04T08:56:57.0060592Z hint: git branch -m 2025-12-04T08:56:57.0060894Z hint: 2025-12-04T08:56:57.0061332Z hint: Disable this message with "git config set advice.defaultBranchName false" 2025-12-04T08:56:57.0062184Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/ 2025-12-04T08:56:57.0065532Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2025-12-04T08:56:57.0092722Z ##[endgroup] 2025-12-04T08:56:57.0093239Z ##[group]Disabling automatic garbage collection 2025-12-04T08:56:57.0095102Z [command]/usr/bin/git config --local gc.auto 0 2025-12-04T08:56:57.0122116Z ##[endgroup] 2025-12-04T08:56:57.0122679Z ##[group]Setting up auth 2025-12-04T08:56:57.0128786Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T08:56:57.0156755Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T08:56:57.0477777Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T08:56:57.0503524Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T08:56:57.0791370Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:56:57.0818917Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T08:56:57.1104955Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T08:56:57.1156415Z ##[endgroup] 2025-12-04T08:56:57.1157067Z ##[group]Fetching the repository 2025-12-04T08:56:57.1163158Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-12-04T08:57:43.5148977Z From https://github.com/pytorch/pytorch 2025-12-04T08:57:43.5149718Z * [new branch] 2.6.0.dev20241004+ -> origin/2.6.0.dev20241004+ 2025-12-04T08:57:43.5150435Z * [new branch] 2.9.1 -> origin/2.9.1 2025-12-04T08:57:43.5151105Z * [new branch] AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest 2025-12-04T08:57:43.5151846Z * [new branch] Flamefire-patch-1 -> origin/Flamefire-patch-1 2025-12-04T08:57:43.5152537Z * [new branch] HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes 2025-12-04T08:57:43.5153202Z * [new branch] HOPrintFunc -> origin/HOPrintFunc 2025-12-04T08:57:43.5154807Z * [new branch] IvanKobzarev/stack/1 -> origin/IvanKobzarev/stack/1 2025-12-04T08:57:43.5156847Z * [new branch] NicoshevSVE128 -> origin/NicoshevSVE128 2025-12-04T08:57:43.5157673Z * [new branch] PR-AOTInductorNoneBug -> origin/PR-AOTInductorNoneBug 2025-12-04T08:57:43.5158896Z * [new branch] PR-AOTInductorNoneBugFix -> origin/PR-AOTInductorNoneBugFix 2025-12-04T08:57:43.5160070Z * [new branch] PR-FixConfigsIssue -> origin/PR-FixConfigsIssue 2025-12-04T08:57:43.5161078Z * [new branch] PR-NoneBugFix-viable -> origin/PR-NoneBugFix-viable 2025-12-04T08:57:43.5162150Z * [new branch] PR-ResetToZero -> origin/PR-ResetToZero 2025-12-04T08:57:43.5163336Z * [new branch] Update-Flash-Packaging -> origin/Update-Flash-Packaging 2025-12-04T08:57:43.5164368Z * [new branch] VLA_exp -> origin/VLA_exp 2025-12-04T08:57:43.5166848Z * [new branch] activation_bench -> origin/activation_bench 2025-12-04T08:57:43.5167973Z * [new branch] addmm-heuristic -> origin/addmm-heuristic 2025-12-04T08:57:43.5169541Z * [new branch] adi/onednn_aarch64 -> origin/adi/onednn_aarch64 2025-12-04T08:57:43.5170601Z * [new branch] adi/test -> origin/adi/test 2025-12-04T08:57:43.5171725Z * [new branch] adi/test_bgemm -> origin/adi/test_bgemm 2025-12-04T08:57:43.5172837Z * [new branch] adi/test_m8g -> origin/adi/test_m8g 2025-12-04T08:57:43.5173912Z * [new branch] adi/test_onednn -> origin/adi/test_onednn 2025-12-04T08:57:43.5175075Z * [new branch] adi/test_onednn_v3.9 -> origin/adi/test_onednn_v3.9 2025-12-04T08:57:43.5176210Z * [new branch] adi/test_presve_change -> origin/adi/test_presve_change 2025-12-04T08:57:43.5177617Z * [new branch] adi/test_timm -> origin/adi/test_timm 2025-12-04T08:57:43.5179159Z * [new branch] adi/testpresve_change -> origin/adi/testpresve_change 2025-12-04T08:57:43.5181119Z * [new branch] aditew01/test/vec_bf16 -> origin/aditew01/test/vec_bf16 2025-12-04T08:57:43.5182315Z * [new branch] ah-globalfeedback-hook -> origin/ah-globalfeedback-hook 2025-12-04T08:57:43.5183633Z * [new branch] albanD-patch-1 -> origin/albanD-patch-1 2025-12-04T08:57:43.5184701Z * [new branch] also-surround-shimh -> origin/also-surround-shimh 2025-12-04T08:57:43.5186721Z * [new branch] angelayi/aot_compile -> origin/angelayi/aot_compile 2025-12-04T08:57:43.5187931Z * [new branch] angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files 2025-12-04T08:57:43.5188981Z * [new branch] angelayi/benchmark -> origin/angelayi/benchmark 2025-12-04T08:57:43.5190253Z * [new branch] angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization 2025-12-04T08:57:43.5191260Z * [new branch] angelayi/cpp_loader -> origin/angelayi/cpp_loader 2025-12-04T08:57:43.5192924Z * [new branch] angelayi/inductor_const -> origin/angelayi/inductor_const 2025-12-04T08:57:43.5193850Z * [new branch] angelayi/lstm -> origin/angelayi/lstm 2025-12-04T08:57:43.5195373Z * [new branch] angelayi/no_so_weight -> origin/angelayi/no_so_weight 2025-12-04T08:57:43.5196780Z * [new branch] angelayi/scan_layers -> origin/angelayi/scan_layers 2025-12-04T08:57:43.5197892Z * [new branch] angelayi/side_eff -> origin/angelayi/side_eff 2025-12-04T08:57:43.5199102Z * [new branch] angelayi/state_dict -> origin/angelayi/state_dict 2025-12-04T08:57:43.5200326Z * [new branch] angelayi/symint_input -> origin/angelayi/symint_input 2025-12-04T08:57:43.5201598Z * [new branch] angelayi/symm_mem -> origin/angelayi/symm_mem 2025-12-04T08:57:43.5202730Z * [new branch] angelayi/test_cpp -> origin/angelayi/test_cpp 2025-12-04T08:57:43.5203897Z * [new branch] angelayi/torch_size -> origin/angelayi/torch_size 2025-12-04T08:57:43.5205018Z * [new branch] annotate_assert -> origin/annotate_assert 2025-12-04T08:57:43.5206229Z * [new branch] annotate_fallback_kernel -> origin/annotate_fallback_kernel 2025-12-04T08:57:43.5207345Z * [new branch] annotation_deepcopy -> origin/annotation_deepcopy 2025-12-04T08:57:43.5208427Z * [new branch] annotation_dynamo -> origin/annotation_dynamo 2025-12-04T08:57:43.5209591Z * [new branch] aot_eager_stack_trace -> origin/aot_eager_stack_trace 2025-12-04T08:57:43.5210673Z * [new branch] aoti-cuda-alloc -> origin/aoti-cuda-alloc 2025-12-04T08:57:43.5211836Z * [new branch] aoti_const_device -> origin/aoti_const_device 2025-12-04T08:57:43.5212917Z * [new branch] aoti_fqn_name_interface -> origin/aoti_fqn_name_interface 2025-12-04T08:57:43.5214061Z * [new branch] aoti_package_weights_binary -> origin/aoti_package_weights_binary 2025-12-04T08:57:43.5215119Z * [new branch] aoti_target_windows -> origin/aoti_target_windows 2025-12-04T08:57:43.5217496Z * [new branch] arsh/feat/inductor_check_profiling -> origin/arsh/feat/inductor_check_profiling 2025-12-04T08:57:43.5218490Z * [new branch] async_tp -> origin/async_tp 2025-12-04T08:57:43.5219810Z * [new branch] atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124 2025-12-04T08:57:43.5221204Z * [new branch] atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1 2025-12-04T08:57:43.5222534Z * [new branch] atalman-patch-2 -> origin/atalman-patch-2 2025-12-04T08:57:43.5223784Z * [new branch] atalman-patch-3 -> origin/atalman-patch-3 2025-12-04T08:57:43.5224942Z * [new branch] atalman-patch-4 -> origin/atalman-patch-4 2025-12-04T08:57:43.5226171Z * [new branch] atalman-patch-5 -> origin/atalman-patch-5 2025-12-04T08:57:43.5227402Z * [new branch] atalman-patch-6 -> origin/atalman-patch-6 2025-12-04T08:57:43.5228594Z * [new branch] atalman-patch-7 -> origin/atalman-patch-7 2025-12-04T08:57:43.5229802Z * [new branch] atalman-patch-8 -> origin/atalman-patch-8 2025-12-04T08:57:43.5231264Z * [new branch] atalman_inductor_2.3.1 -> origin/atalman_inductor_2.3.1 2025-12-04T08:57:43.5232344Z * [new branch] atalman_inductor_2.4.0 -> origin/atalman_inductor_2.4.0 2025-12-04T08:57:43.5233683Z * [new branch] atalman_inductor_2.4.x -> origin/atalman_inductor_2.4.x 2025-12-04T08:57:43.5234924Z * [new branch] attention_benchmarking_clean -> origin/attention_benchmarking_clean 2025-12-04T08:57:43.5236461Z * [new branch] bahuang/dt_fix_scalar_add -> origin/bahuang/dt_fix_scalar_add 2025-12-04T08:57:43.5237544Z * [new branch] bahuang/fix_debug_mode -> origin/bahuang/fix_debug_mode 2025-12-04T08:57:43.5238600Z * [new branch] bahuang/fix_expand -> origin/bahuang/fix_expand 2025-12-04T08:57:43.5239707Z * [new branch] bahuang/test -> origin/bahuang/test 2025-12-04T08:57:43.5241408Z * [new branch] base/1.5 -> origin/base/1.5 2025-12-04T08:57:43.5242832Z * [new branch] batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention 2025-12-04T08:57:43.5243788Z * [new branch] bench_scaled_mm_ops -> origin/bench_scaled_mm_ops 2025-12-04T08:57:43.5245073Z * [new branch] benchmark-updates -> origin/benchmark-updates 2025-12-04T08:57:43.5246054Z * [new branch] benchmarking-script -> origin/benchmarking-script 2025-12-04T08:57:43.5247619Z * [new branch] bertmaher/pinbump26 -> origin/bertmaher/pinbump26 2025-12-04T08:57:43.5249115Z * [new branch] bertrand/cutlass -> origin/bertrand/cutlass 2025-12-04T08:57:43.5250635Z * [new branch] bf/bug-static-input -> origin/bf/bug-static-input 2025-12-04T08:57:43.5251581Z * [new branch] bf/cg-backend -> origin/bf/cg-backend 2025-12-04T08:57:43.5252621Z * [new branch] bf/cg-nccl-test -> origin/bf/cg-nccl-test 2025-12-04T08:57:43.5253716Z * [new branch] bf/cg-remove-check -> origin/bf/cg-remove-check 2025-12-04T08:57:43.5254904Z * [new branch] bf/clean-torchbench-hf -> origin/bf/clean-torchbench-hf 2025-12-04T08:57:43.5255927Z * [new branch] bf/combo-debug-log -> origin/bf/combo-debug-log 2025-12-04T08:57:43.5257296Z * [new branch] bf/cudagraph -> origin/bf/cudagraph 2025-12-04T08:57:43.5259034Z * [new branch] bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation 2025-12-04T08:57:43.5260478Z * [new branch] bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark 2025-12-04T08:57:43.5261538Z * [new branch] bf/cudagraph-partition -> origin/bf/cudagraph-partition 2025-12-04T08:57:43.5262447Z * [new branch] bf/donated-buffer-bench -> origin/bf/donated-buffer-bench 2025-12-04T08:57:43.5263618Z * [new branch] bf/dynamo-partition -> origin/bf/dynamo-partition 2025-12-04T08:57:43.5264738Z * [new branch] bf/lite -> origin/bf/lite 2025-12-04T08:57:43.5265963Z * [new branch] bf/pa-non-divisible -> origin/bf/pa-non-divisible 2025-12-04T08:57:43.5267215Z * [new branch] bf/partition-cache-free-symbols -> origin/bf/partition-cache-free-symbols 2025-12-04T08:57:43.5268400Z * [new branch] bf/partition-memory-plan -> origin/bf/partition-memory-plan 2025-12-04T08:57:43.5269613Z * [new branch] bf/partition-move-cpu -> origin/bf/partition-move-cpu 2025-12-04T08:57:43.5270808Z * [new branch] bf/partition-view-fallback -> origin/bf/partition-view-fallback 2025-12-04T08:57:43.5271877Z * [new branch] bf/remove-check-55b0c39d -> origin/bf/remove-check-55b0c39d 2025-12-04T08:57:43.5272908Z * [new branch] bf/timm-nov-26-2025 -> origin/bf/timm-nov-26-2025 2025-12-04T08:57:43.5274094Z * [new branch] bf/transformer-pin-4-57-3 -> origin/bf/transformer-pin-4-57-3 2025-12-04T08:57:43.5275220Z * [new branch] bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492 2025-12-04T08:57:43.5276294Z * [new branch] bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb 2025-12-04T08:57:43.5277353Z * [new branch] bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129 2025-12-04T08:57:43.5278398Z * [new branch] bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d 2025-12-04T08:57:43.5279413Z * [new branch] bisect_perf_hf_T5_5268754e -> origin/bisect_perf_hf_T5_5268754e 2025-12-04T08:57:43.5280489Z * [new branch] bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c 2025-12-04T08:57:43.5281580Z * [new branch] bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c 2025-12-04T08:57:43.5282614Z * [new branch] bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f 2025-12-04T08:57:43.5283664Z * [new branch] bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0 2025-12-04T08:57:43.5284905Z * [new branch] bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149 2025-12-04T08:57:43.5285850Z * [new branch] bisect_perf_hf_T5_d65f194a -> origin/bisect_perf_hf_T5_d65f194a 2025-12-04T08:57:43.5286905Z * [new branch] bisect_perf_hf_T5_da94ab0b -> origin/bisect_perf_hf_T5_da94ab0b 2025-12-04T08:57:43.5288012Z * [new branch] bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new 2025-12-04T08:57:43.5289074Z * [new branch] bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8 2025-12-04T08:57:43.5290097Z * [new branch] bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2 2025-12-04T08:57:43.5291341Z * [new branch] bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563 2025-12-04T08:57:43.5292958Z * [new branch] brister/fx_device_type -> origin/brister/fx_device_type 2025-12-04T08:57:43.5294023Z * [new branch] brister/test_inductor_all_fx -> origin/brister/test_inductor_all_fx 2025-12-04T08:57:43.5295206Z * [new branch] brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check 2025-12-04T08:57:43.5296189Z * [new branch] bwd-backup -> origin/bwd-backup 2025-12-04T08:57:43.5297956Z * [new branch] c57382a49 -> origin/c57382a49 2025-12-04T08:57:43.5298958Z * [new branch] ca_0431d47eaa -> origin/ca_0431d47eaa 2025-12-04T08:57:43.5300054Z * [new branch] ca_fix_0431d47eaa -> origin/ca_fix_0431d47eaa 2025-12-04T08:57:43.5301799Z * [new branch] camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push 2025-12-04T08:57:43.5302975Z * [new branch] cccclai-patch-1 -> origin/cccclai-patch-1 2025-12-04T08:57:43.5304332Z * [new branch] cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5305500Z * [new branch] cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5306744Z * [new branch] cherry-pick-162208-by-pytorch_bot_bot_ -> origin/cherry-pick-162208-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5307894Z * [new branch] cherry-pick-163169-by-pytorch_bot_bot_ -> origin/cherry-pick-163169-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5309183Z * [new branch] cherry-pick-165086-by-pytorch_bot_bot_ -> origin/cherry-pick-165086-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5310472Z * [new branch] cherry-pick-165514-by-pytorch_bot_bot_ -> origin/cherry-pick-165514-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5311582Z * [new branch] cherry-pick-165601-by-pytorch_bot_bot_ -> origin/cherry-pick-165601-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5312764Z * [new branch] cherry-pick-165667-by-pytorch_bot_bot_ -> origin/cherry-pick-165667-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5314017Z * [new branch] cherry-pick-165815-by-pytorch_bot_bot_ -> origin/cherry-pick-165815-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5315217Z * [new branch] cherry-pick-165922-by-pytorch_bot_bot_ -> origin/cherry-pick-165922-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5316367Z * [new branch] cherry-pick-166148-by-pytorch_bot_bot_ -> origin/cherry-pick-166148-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5317456Z * [new branch] cherry-pick-166181-by-pytorch_bot_bot_ -> origin/cherry-pick-166181-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5318602Z * [new branch] cherry-pick-166404-by-pytorch_bot_bot_ -> origin/cherry-pick-166404-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5319781Z * [new branch] cherry-pick-166427-by-pytorch_bot_bot_ -> origin/cherry-pick-166427-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5321327Z * [new branch] cherry-pick-166480-by-pytorch_bot_bot_ -> origin/cherry-pick-166480-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5322577Z * [new branch] cherry-pick-166570-by-pytorch_bot_bot_ -> origin/cherry-pick-166570-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5323661Z * [new branch] cherry-pick-166993-by-pytorch_bot_bot_ -> origin/cherry-pick-166993-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5324876Z * [new branch] cherry-pick-167111-by-pytorch_bot_bot_ -> origin/cherry-pick-167111-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5326105Z * [new branch] cherry-pick-167478-by-pytorch_bot_bot_ -> origin/cherry-pick-167478-by-pytorch_bot_bot_ 2025-12-04T08:57:43.5327076Z * [new branch] cherry_pick_166036_166040 -> origin/cherry_pick_166036_166040 2025-12-04T08:57:43.5328233Z * [new branch] cherry_pick_166457 -> origin/cherry_pick_166457 2025-12-04T08:57:43.5329479Z * [new branch] cherrypick_166338 -> origin/cherrypick_166338 2025-12-04T08:57:43.5330652Z * [new branch] cherrypick_166458 -> origin/cherrypick_166458 2025-12-04T08:57:43.5331757Z * [new branch] cherrypick_166586 -> origin/cherrypick_166586 2025-12-04T08:57:43.5332895Z * [new branch] cherrypick_166956 -> origin/cherrypick_166956 2025-12-04T08:57:43.5334091Z * [new branch] ci_attn -> origin/ci_attn 2025-12-04T08:57:43.5335241Z * [new branch] codex-testing -> origin/codex-testing 2025-12-04T08:57:43.5337512Z * [new branch] codex/add-check_memory_overlap-helper-functions -> origin/codex/add-check_memory_overlap-helper-functions 2025-12-04T08:57:43.5338522Z * [new branch] codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch 2025-12-04T08:57:43.5339975Z * [new branch] codex/investigate-segfaults-in-get_tensor_storage_id -> origin/codex/investigate-segfaults-in-get_tensor_storage_id 2025-12-04T08:57:43.5341321Z * [new branch] codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run 2025-12-04T08:57:43.5342253Z * [new branch] compatiblpy39util -> origin/compatiblpy39util 2025-12-04T08:57:43.5343834Z * [new branch] cond_hop_device -> origin/cond_hop_device 2025-12-04T08:57:43.5344908Z * [new branch] context_test -> origin/context_test 2025-12-04T08:57:43.5346722Z * [new branch] copilot/code-style-cleanup-python-pip -> origin/copilot/code-style-cleanup-python-pip 2025-12-04T08:57:43.5348067Z * [new branch] cpio/fix_new_ami_tests -> origin/cpio/fix_new_ami_tests 2025-12-04T08:57:43.5349364Z * [new branch] cpp-docs-dependency-upgrade -> origin/cpp-docs-dependency-upgrade 2025-12-04T08:57:43.5350823Z * [new branch] csl/always_produce_xml -> origin/csl/always_produce_xml 2025-12-04T08:57:43.5351768Z * [new branch] csl/build_test_more_procs -> origin/csl/build_test_more_procs 2025-12-04T08:57:43.5352822Z * [new branch] csl/build_test_more_procs2 -> origin/csl/build_test_more_procs2 2025-12-04T08:57:43.5353932Z * [new branch] csl/clean_up -> origin/csl/clean_up 2025-12-04T08:57:43.5355313Z * [new branch] csl/fix_retry_segfault_exit -> origin/csl/fix_retry_segfault_exit 2025-12-04T08:57:43.5356287Z * [new branch] csl/katex -> origin/csl/katex 2025-12-04T08:57:43.5358144Z * [new branch] csl/larger_runner -> origin/csl/larger_runner 2025-12-04T08:57:43.5359551Z * [new branch] csl/lint_testing -> origin/csl/lint_testing 2025-12-04T08:57:43.5360857Z * [new branch] csl/lint_thing -> origin/csl/lint_thing 2025-12-04T08:57:43.5361996Z * [new branch] csl/lintrunner_stuff -> origin/csl/lintrunner_stuff 2025-12-04T08:57:43.5363150Z * [new branch] csl/manually_gen_json -> origin/csl/manually_gen_json 2025-12-04T08:57:43.5364319Z * [new branch] csl/mps_sharding -> origin/csl/mps_sharding 2025-12-04T08:57:43.5365427Z * [new branch] csl/multistage_docker -> origin/csl/multistage_docker 2025-12-04T08:57:43.5366500Z * [new branch] csl/print_timing -> origin/csl/print_timing 2025-12-04T08:57:43.5367602Z * [new branch] csl/remove_experiment -> origin/csl/remove_experiment 2025-12-04T08:57:43.5368751Z * [new branch] csl/remove_maybe_unused_var -> origin/csl/remove_maybe_unused_var 2025-12-04T08:57:43.5369936Z * [new branch] csl/remove_repo_specific_autolabel -> origin/csl/remove_repo_specific_autolabel 2025-12-04T08:57:43.5370988Z * [new branch] csl/remove_run_parallel -> origin/csl/remove_run_parallel 2025-12-04T08:57:43.5371993Z * [new branch] csl/remove_unused_vars -> origin/csl/remove_unused_vars 2025-12-04T08:57:43.5373139Z * [new branch] csl/revert_open -> origin/csl/revert_open 2025-12-04T08:57:43.5374236Z * [new branch] csl/skip_build -> origin/csl/skip_build 2025-12-04T08:57:43.5375352Z * [new branch] csl/smaller_avx_amx_runenrs -> origin/csl/smaller_avx_amx_runenrs 2025-12-04T08:57:43.5376429Z * [new branch] csl/td_job_level -> origin/csl/td_job_level 2025-12-04T08:57:43.5377918Z * [new branch] csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner 2025-12-04T08:57:43.5379246Z * [new branch] csl/test_owners_autograd_dispatch_nn -> origin/csl/test_owners_autograd_dispatch_nn 2025-12-04T08:57:43.5380338Z * [new branch] csl/test_owners_higher_confidence -> origin/csl/test_owners_higher_confidence 2025-12-04T08:57:43.5381411Z * [new branch] csl/upload_json_running -> origin/csl/upload_json_running 2025-12-04T08:57:43.5382509Z * [new branch] csl/win_sccache -> origin/csl/win_sccache 2025-12-04T08:57:43.5383597Z * [new branch] csl/xml_stuff -> origin/csl/xml_stuff 2025-12-04T08:57:43.5409636Z * [new branch] cublasrelax2 -> origin/cublasrelax2 2025-12-04T08:57:43.5410265Z * [new branch] cuda_mempool -> origin/cuda_mempool 2025-12-04T08:57:43.5410855Z * [new branch] custom_lowering_dict -> origin/custom_lowering_dict 2025-12-04T08:57:43.5411520Z * [new branch] d4l3k/debug_plane_frtrace -> origin/d4l3k/debug_plane_frtrace 2025-12-04T08:57:43.5412142Z * [new branch] daxia6/2.8o3 -> origin/daxia6/2.8o3 2025-12-04T08:57:43.5412699Z * [new branch] debug-guard -> origin/debug-guard 2025-12-04T08:57:43.5413277Z * [new branch] delete-quant-docs -> origin/delete-quant-docs 2025-12-04T08:57:43.5414389Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 2025-12-04T08:57:43.5415932Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 2025-12-04T08:57:43.5417332Z * [new branch] desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper 2025-12-04T08:57:43.5418140Z * [new branch] desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64 2025-12-04T08:57:43.5418931Z * [new branch] dev/dhruva/flex_attn_opt -> origin/dev/dhruva/flex_attn_opt 2025-12-04T08:57:43.5419665Z * [new branch] dev/joona/MPSNDArrayAdd -> origin/dev/joona/MPSNDArrayAdd 2025-12-04T08:57:43.5420408Z * [new branch] dev/joona/Unranked -> origin/dev/joona/Unranked 2025-12-04T08:57:43.5421422Z * [new branch] dev/joona/cat -> origin/dev/joona/cat 2025-12-04T08:57:43.5422155Z * [new branch] dev/joona/embeddingbag -> origin/dev/joona/embeddingbag 2025-12-04T08:57:43.5422858Z * [new branch] dev/joona/fix_sdpa_memtest -> origin/dev/joona/fix_sdpa_memtest 2025-12-04T08:57:43.5423603Z * [new branch] dev/joona/getTensorsString -> origin/dev/joona/getTensorsString 2025-12-04T08:57:43.5424334Z * [new branch] dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14 2025-12-04T08:57:43.5425042Z * [new branch] dev/joona/scalar_clamp -> origin/dev/joona/scalar_clamp 2025-12-04T08:57:43.5425689Z * [new branch] dev/joona/sdpa -> origin/dev/joona/sdpa 2025-12-04T08:57:43.5426309Z * [new branch] dev/joona/sdpa_api -> origin/dev/joona/sdpa_api 2025-12-04T08:57:43.5426928Z * [new branch] dev/joona/type_inf -> origin/dev/joona/type_inf 2025-12-04T08:57:43.5427620Z * [new branch] dev/joona/ulpAssertClose -> origin/dev/joona/ulpAssertClose 2025-12-04T08:57:43.5428307Z * [new branch] dev/joona/upsize3d -> origin/dev/joona/upsize3d 2025-12-04T08:57:43.5428898Z * [new branch] disp_counter -> origin/disp_counter 2025-12-04T08:57:43.5429507Z * [new branch] divyanshk-patch-1 -> origin/divyanshk-patch-1 2025-12-04T08:57:43.5430097Z * [new branch] docs -> origin/docs 2025-12-04T08:57:43.5430650Z * [new branch] documentation -> origin/documentation 2025-12-04T08:57:43.5431270Z * [new branch] eager_model_benchmarks -> origin/eager_model_benchmarks 2025-12-04T08:57:43.5431985Z * [new branch] embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control 2025-12-04T08:57:43.5432734Z * [new branch] embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B 2025-12-04T08:57:43.5433554Z * [new branch] embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B 2025-12-04T08:57:43.5434183Z * [new branch] eqy-patch-1 -> origin/eqy-patch-1 2025-12-04T08:57:43.5434741Z * [new branch] eqy-patch-2 -> origin/eqy-patch-2 2025-12-04T08:57:43.5435294Z * [new branch] eqy-patch-3 -> origin/eqy-patch-3 2025-12-04T08:57:43.5435846Z * [new branch] eqy-patch-4 -> origin/eqy-patch-4 2025-12-04T08:57:43.5436704Z * [new branch] eqy-patch-5 -> origin/eqy-patch-5 2025-12-04T08:57:43.5437725Z * [new branch] eqy-patch-6 -> origin/eqy-patch-6 2025-12-04T08:57:43.5439405Z * [new branch] exclamaforte/amd-ma -> origin/exclamaforte/amd-ma 2025-12-04T08:57:43.5440577Z * [new branch] exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run 2025-12-04T08:57:43.5441513Z * [new branch] exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor 2025-12-04T08:57:43.5442636Z * [new branch] exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion 2025-12-04T08:57:43.5443779Z * [new branch] exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning 2025-12-04T08:57:43.5445123Z * [new branch] exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg 2025-12-04T08:57:43.5446568Z * [new branch] exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run 2025-12-04T08:57:43.5447500Z * [new branch] exclamaforte/fusion-data -> origin/exclamaforte/fusion-data 2025-12-04T08:57:43.5448710Z * [new branch] exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run 2025-12-04T08:57:43.5449934Z * [new branch] exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model 2025-12-04T08:57:43.5450911Z * [new branch] exclamaforte/gemm-model -> origin/exclamaforte/gemm-model 2025-12-04T08:57:43.5452174Z * [new branch] exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection 2025-12-04T08:57:43.5453078Z * [new branch] exclamaforte/gemm-to-amd -> origin/exclamaforte/gemm-to-amd 2025-12-04T08:57:43.5454199Z * [new branch] exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model 2025-12-04T08:57:43.5455512Z * [new branch] exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor 2025-12-04T08:57:43.5456632Z * [new branch] exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo 2025-12-04T08:57:43.5458014Z * [new branch] exclamaforte/profiler-visualization -> origin/exclamaforte/profiler-visualization 2025-12-04T08:57:43.5459184Z * [new branch] exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode 2025-12-04T08:57:43.5460355Z * [new branch] exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs 2025-12-04T08:57:43.5461510Z * [new branch] exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2 2025-12-04T08:57:43.5462471Z * [new branch] exec -> origin/exec 2025-12-04T08:57:43.5463939Z * [new branch] experimental-mosaic -> origin/experimental-mosaic 2025-12-04T08:57:43.5465060Z * [new branch] export-D61047529 -> origin/export-D61047529 2025-12-04T08:57:43.5466254Z * [new branch] export-D71412006 -> origin/export-D71412006 2025-12-04T08:57:43.5467640Z * [new branch] export-D73042989 -> origin/export-D73042989 2025-12-04T08:57:43.5468774Z * [new branch] export-D78957093 -> origin/export-D78957093 2025-12-04T08:57:43.5469845Z * [new branch] export-D78996107 -> origin/export-D78996107 2025-12-04T08:57:43.5470957Z * [new branch] export-D80823877 -> origin/export-D80823877 2025-12-04T08:57:43.5472264Z * [new branch] export-D80958642 -> origin/export-D80958642 2025-12-04T08:57:43.5473235Z * [new branch] export-D81054193 -> origin/export-D81054193 2025-12-04T08:57:43.5474306Z * [new branch] export-D81204584 -> origin/export-D81204584 2025-12-04T08:57:43.5475450Z * [new branch] export-D81429090 -> origin/export-D81429090 2025-12-04T08:57:43.5476744Z * [new branch] export-D82250826 -> origin/export-D82250826 2025-12-04T08:57:43.5477735Z * [new branch] export-D82253817 -> origin/export-D82253817 2025-12-04T08:57:43.5478828Z * [new branch] export-D83541846 -> origin/export-D83541846 2025-12-04T08:57:43.5479956Z * [new branch] export-D83627170 -> origin/export-D83627170 2025-12-04T08:57:43.5481086Z * [new branch] export-D83766701 -> origin/export-D83766701 2025-12-04T08:57:43.5482417Z * [new branch] export-D83768878 -> origin/export-D83768878 2025-12-04T08:57:43.5483483Z * [new branch] export-D83769447 -> origin/export-D83769447 2025-12-04T08:57:43.5484521Z * [new branch] export-D84089824 -> origin/export-D84089824 2025-12-04T08:57:43.5485619Z * [new branch] export-D84213020 -> origin/export-D84213020 2025-12-04T08:57:43.5487333Z * [new branch] export-D84373821 -> origin/export-D84373821 2025-12-04T08:57:43.5488375Z * [new branch] export-D84612194 -> origin/export-D84612194 2025-12-04T08:57:43.5489629Z * [new branch] export-D84890985 -> origin/export-D84890985 2025-12-04T08:57:43.5490594Z * [new branch] export-D85122326 -> origin/export-D85122326 2025-12-04T08:57:43.5492356Z * [new branch] export-D86256198 -> origin/export-D86256198 2025-12-04T08:57:43.5493383Z * [new branch] export-D86460608 -> origin/export-D86460608 2025-12-04T08:57:43.5494711Z * [new branch] export-D86474796 -> origin/export-D86474796 2025-12-04T08:57:43.5495860Z * [new branch] export-D86712396 -> origin/export-D86712396 2025-12-04T08:57:43.5497427Z * [new branch] export-D87022129 -> origin/export-D87022129 2025-12-04T08:57:43.5498630Z * [new branch] export-D87838959 -> origin/export-D87838959 2025-12-04T08:57:43.5499857Z * [new branch] export-D88319437 -> origin/export-D88319437 2025-12-04T08:57:43.5501230Z * [new branch] exported-model-train-idempotent -> origin/exported-model-train-idempotent 2025-12-04T08:57:43.5502332Z * [new branch] ezyang-titan-october -> origin/ezyang-titan-october 2025-12-04T08:57:43.5503462Z * [new branch] ezyang-titan-october2 -> origin/ezyang-titan-october2 2025-12-04T08:57:43.5504524Z * [new branch] ezyang-war -> origin/ezyang-war 2025-12-04T08:57:43.5506175Z * [new branch] ezyang/wip-aot-descriptors -> origin/ezyang/wip-aot-descriptors 2025-12-04T08:57:43.5507204Z * [new branch] fa_u8_brgemm -> origin/fa_u8_brgemm 2025-12-04T08:57:43.5508992Z * [new branch] fadeputr/sequence_fbgemm -> origin/fadeputr/sequence_fbgemm 2025-12-04T08:57:43.5510049Z * [new branch] fastmath_baseline -> origin/fastmath_baseline 2025-12-04T08:57:43.5511617Z * [new branch] fbcode/warm -> origin/fbcode/warm 2025-12-04T08:57:43.5512864Z * [new branch] fca -> origin/fca 2025-12-04T08:57:43.5513907Z * [new branch] fca2_ca5984c -> origin/fca2_ca5984c 2025-12-04T08:57:43.5514991Z * [new branch] fca5 -> origin/fca5 2025-12-04T08:57:43.5516597Z * [new branch] feature/justknobs-cpp -> origin/feature/justknobs-cpp 2025-12-04T08:57:43.5517678Z * [new branch] feature/numa-forkserver -> origin/feature/numa-forkserver 2025-12-04T08:57:43.5519775Z * [new branch] ffast_math_baseline -> origin/ffast_math_baseline 2025-12-04T08:57:43.5520931Z * [new branch] ffast_math_target -> origin/ffast_math_target 2025-12-04T08:57:43.5522889Z * [new branch] findhao/base_commit -> origin/findhao/base_commit 2025-12-04T08:57:43.5523969Z * [new branch] findhao/base_commit1 -> origin/findhao/base_commit1 2025-12-04T08:57:43.5525088Z * [new branch] findhao/multistream2 -> origin/findhao/multistream2 2025-12-04T08:57:43.5526172Z * [new branch] findhao/multistream5 -> origin/findhao/multistream5 2025-12-04T08:57:43.5527173Z * [new branch] findhao/multistream6 -> origin/findhao/multistream6 2025-12-04T08:57:43.5528263Z * [new branch] findhao/operatorbench3 -> origin/findhao/operatorbench3 2025-12-04T08:57:43.5529322Z * [new branch] findhao/operatorbench5 -> origin/findhao/operatorbench5 2025-12-04T08:57:43.5530411Z * [new branch] findhao/tritonparse -> origin/findhao/tritonparse 2025-12-04T08:57:43.5531660Z * [new branch] fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format 2025-12-04T08:57:43.5532776Z * [new branch] fix-config-ignore -> origin/fix-config-ignore 2025-12-04T08:57:43.5533997Z * [new branch] fix-dict-guard -> origin/fix-dict-guard 2025-12-04T08:57:43.5535455Z * [new branch] fix_addmm_issue -> origin/fix_addmm_issue 2025-12-04T08:57:43.5536477Z * [new branch] fix_amd_missing_cluster_dims -> origin/fix_amd_missing_cluster_dims 2025-12-04T08:57:43.5537809Z * [new branch] fix_bench_bwd_pass -> origin/fix_bench_bwd_pass 2025-12-04T08:57:43.5538864Z * [new branch] fix_mem_profiler_config -> origin/fix_mem_profiler_config 2025-12-04T08:57:43.5539970Z * [new branch] fix_nvrtc_discovery -> origin/fix_nvrtc_discovery 2025-12-04T08:57:43.5541049Z * [new branch] fix_op_runner -> origin/fix_op_runner 2025-12-04T08:57:43.5542164Z * [new branch] fix_ubn_159469 -> origin/fix_ubn_159469 2025-12-04T08:57:43.5543357Z * [new branch] fixes-triage -> origin/fixes-triage 2025-12-04T08:57:43.5544486Z * [new branch] fixflashinfer -> origin/fixflashinfer 2025-12-04T08:57:43.5545820Z * [new branch] flash_decoding_cpu -> origin/flash_decoding_cpu 2025-12-04T08:57:43.5546837Z * [new branch] flex-flash -> origin/flex-flash 2025-12-04T08:57:43.5548079Z * [new branch] flex_attention_functorch_grad -> origin/flex_attention_functorch_grad 2025-12-04T08:57:43.5549211Z * [new branch] flex_flash -> origin/flex_flash 2025-12-04T08:57:43.5550953Z * [new branch] fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule 2025-12-04T08:57:43.5552029Z * [new branch] fmassa/tests_comm_compute_scheduler -> origin/fmassa/tests_comm_compute_scheduler 2025-12-04T08:57:43.5553015Z * [new branch] forkserver_fix -> origin/forkserver_fix 2025-12-04T08:57:43.5554088Z * [new branch] fsdp2_trace_rules -> origin/fsdp2_trace_rules 2025-12-04T08:57:43.5555221Z * [new branch] fx_cpp -> origin/fx_cpp 2025-12-04T08:57:43.5556892Z * [new branch] fy/fix-win -> origin/fy/fix-win 2025-12-04T08:57:43.5558036Z * [new branch] galv-patch-1 -> origin/galv-patch-1 2025-12-04T08:57:43.5559899Z * [new branch] galv/cudagraphs-conditional-nodes-4 -> origin/galv/cudagraphs-conditional-nodes-4 2025-12-04T08:57:43.5561222Z * [new branch] georgehong/cmakelists-patch -> origin/georgehong/cmakelists-patch 2025-12-04T08:57:43.5563576Z * [new branch] gh/AlnisM/1/base -> origin/gh/AlnisM/1/base 2025-12-04T08:57:43.5564602Z * [new branch] gh/AlnisM/1/head -> origin/gh/AlnisM/1/head 2025-12-04T08:57:43.5566565Z * [new branch] gh/EikanWang/67/base -> origin/gh/EikanWang/67/base 2025-12-04T08:57:43.5567611Z * [new branch] gh/EikanWang/67/head -> origin/gh/EikanWang/67/head 2025-12-04T08:57:43.5569837Z * [new branch] gh/Gasoonjia/1/base -> origin/gh/Gasoonjia/1/base 2025-12-04T08:57:43.5570887Z * [new branch] gh/Gasoonjia/1/head -> origin/gh/Gasoonjia/1/head 2025-12-04T08:57:43.5572826Z * [new branch] gh/H-Huang/131/base -> origin/gh/H-Huang/131/base 2025-12-04T08:57:43.5573929Z * [new branch] gh/H-Huang/131/head -> origin/gh/H-Huang/131/head 2025-12-04T08:57:43.5575065Z * [new branch] gh/H-Huang/131/orig -> origin/gh/H-Huang/131/orig 2025-12-04T08:57:43.5576862Z * [new branch] gh/H-Huang/132/base -> origin/gh/H-Huang/132/base 2025-12-04T08:57:43.5577991Z * [new branch] gh/H-Huang/132/head -> origin/gh/H-Huang/132/head 2025-12-04T08:57:43.5579138Z * [new branch] gh/H-Huang/132/orig -> origin/gh/H-Huang/132/orig 2025-12-04T08:57:43.5580750Z * [new branch] gh/H-Huang/180/base -> origin/gh/H-Huang/180/base 2025-12-04T08:57:43.5581941Z * [new branch] gh/H-Huang/180/head -> origin/gh/H-Huang/180/head 2025-12-04T08:57:43.5582934Z * [new branch] gh/H-Huang/180/orig -> origin/gh/H-Huang/180/orig 2025-12-04T08:57:43.5584436Z * [new branch] gh/H-Huang/182/base -> origin/gh/H-Huang/182/base 2025-12-04T08:57:43.5585475Z * [new branch] gh/H-Huang/182/head -> origin/gh/H-Huang/182/head 2025-12-04T08:57:43.5586623Z * [new branch] gh/H-Huang/182/orig -> origin/gh/H-Huang/182/orig 2025-12-04T08:57:43.5588196Z * [new branch] gh/H-Huang/226/base -> origin/gh/H-Huang/226/base 2025-12-04T08:57:43.5589672Z * [new branch] gh/H-Huang/226/head -> origin/gh/H-Huang/226/head 2025-12-04T08:57:43.5590720Z * [new branch] gh/H-Huang/226/orig -> origin/gh/H-Huang/226/orig 2025-12-04T08:57:43.5592219Z * [new branch] gh/H-Huang/228/base -> origin/gh/H-Huang/228/base 2025-12-04T08:57:43.5593267Z * [new branch] gh/H-Huang/228/head -> origin/gh/H-Huang/228/head 2025-12-04T08:57:43.5594340Z * [new branch] gh/H-Huang/228/orig -> origin/gh/H-Huang/228/orig 2025-12-04T08:57:43.5596354Z * [new branch] gh/IvanKobzarev/150/base -> origin/gh/IvanKobzarev/150/base 2025-12-04T08:57:43.5597349Z * [new branch] gh/IvanKobzarev/150/head -> origin/gh/IvanKobzarev/150/head 2025-12-04T08:57:43.5598448Z * [new branch] gh/IvanKobzarev/150/orig -> origin/gh/IvanKobzarev/150/orig 2025-12-04T08:57:43.5600113Z * [new branch] gh/IvanKobzarev/157/base -> origin/gh/IvanKobzarev/157/base 2025-12-04T08:57:43.5601226Z * [new branch] gh/IvanKobzarev/157/head -> origin/gh/IvanKobzarev/157/head 2025-12-04T08:57:43.5602330Z * [new branch] gh/IvanKobzarev/157/orig -> origin/gh/IvanKobzarev/157/orig 2025-12-04T08:57:43.5603950Z * [new branch] gh/IvanKobzarev/159/base -> origin/gh/IvanKobzarev/159/base 2025-12-04T08:57:43.5604995Z * [new branch] gh/IvanKobzarev/159/head -> origin/gh/IvanKobzarev/159/head 2025-12-04T08:57:43.5606101Z * [new branch] gh/IvanKobzarev/159/orig -> origin/gh/IvanKobzarev/159/orig 2025-12-04T08:57:43.5607671Z * [new branch] gh/IvanKobzarev/162/base -> origin/gh/IvanKobzarev/162/base 2025-12-04T08:57:43.5608886Z * [new branch] gh/IvanKobzarev/162/head -> origin/gh/IvanKobzarev/162/head 2025-12-04T08:57:43.5609971Z * [new branch] gh/IvanKobzarev/162/orig -> origin/gh/IvanKobzarev/162/orig 2025-12-04T08:57:43.5611508Z * [new branch] gh/IvanKobzarev/163/base -> origin/gh/IvanKobzarev/163/base 2025-12-04T08:57:43.5612529Z * [new branch] gh/IvanKobzarev/163/head -> origin/gh/IvanKobzarev/163/head 2025-12-04T08:57:43.5613743Z * [new branch] gh/IvanKobzarev/163/orig -> origin/gh/IvanKobzarev/163/orig 2025-12-04T08:57:43.5615374Z * [new branch] gh/IvanKobzarev/166/base -> origin/gh/IvanKobzarev/166/base 2025-12-04T08:57:43.5616494Z * [new branch] gh/IvanKobzarev/166/head -> origin/gh/IvanKobzarev/166/head 2025-12-04T08:57:43.5617892Z * [new branch] gh/IvanKobzarev/166/orig -> origin/gh/IvanKobzarev/166/orig 2025-12-04T08:57:43.5619556Z * [new branch] gh/IvanKobzarev/167/base -> origin/gh/IvanKobzarev/167/base 2025-12-04T08:57:43.5620555Z * [new branch] gh/IvanKobzarev/167/head -> origin/gh/IvanKobzarev/167/head 2025-12-04T08:57:43.5621918Z * [new branch] gh/IvanKobzarev/167/orig -> origin/gh/IvanKobzarev/167/orig 2025-12-04T08:57:43.5623511Z * [new branch] gh/IvanKobzarev/168/base -> origin/gh/IvanKobzarev/168/base 2025-12-04T08:57:43.5624573Z * [new branch] gh/IvanKobzarev/168/head -> origin/gh/IvanKobzarev/168/head 2025-12-04T08:57:43.5625886Z * [new branch] gh/IvanKobzarev/168/orig -> origin/gh/IvanKobzarev/168/orig 2025-12-04T08:57:43.5627323Z * [new branch] gh/IvanKobzarev/169/base -> origin/gh/IvanKobzarev/169/base 2025-12-04T08:57:43.5628426Z * [new branch] gh/IvanKobzarev/169/head -> origin/gh/IvanKobzarev/169/head 2025-12-04T08:57:43.5629553Z * [new branch] gh/IvanKobzarev/169/orig -> origin/gh/IvanKobzarev/169/orig 2025-12-04T08:57:43.5630971Z * [new branch] gh/IvanKobzarev/170/base -> origin/gh/IvanKobzarev/170/base 2025-12-04T08:57:43.5632075Z * [new branch] gh/IvanKobzarev/170/head -> origin/gh/IvanKobzarev/170/head 2025-12-04T08:57:43.5633304Z * [new branch] gh/IvanKobzarev/170/orig -> origin/gh/IvanKobzarev/170/orig 2025-12-04T08:57:43.5635029Z * [new branch] gh/IvanKobzarev/171/base -> origin/gh/IvanKobzarev/171/base 2025-12-04T08:57:43.5636063Z * [new branch] gh/IvanKobzarev/171/head -> origin/gh/IvanKobzarev/171/head 2025-12-04T08:57:43.5637170Z * [new branch] gh/IvanKobzarev/171/orig -> origin/gh/IvanKobzarev/171/orig 2025-12-04T08:57:43.5638804Z * [new branch] gh/IvanKobzarev/172/base -> origin/gh/IvanKobzarev/172/base 2025-12-04T08:57:43.5639899Z * [new branch] gh/IvanKobzarev/172/head -> origin/gh/IvanKobzarev/172/head 2025-12-04T08:57:43.5640994Z * [new branch] gh/IvanKobzarev/172/orig -> origin/gh/IvanKobzarev/172/orig 2025-12-04T08:57:43.5642609Z * [new branch] gh/IvanKobzarev/173/base -> origin/gh/IvanKobzarev/173/base 2025-12-04T08:57:43.5643652Z * [new branch] gh/IvanKobzarev/173/head -> origin/gh/IvanKobzarev/173/head 2025-12-04T08:57:43.5644723Z * [new branch] gh/IvanKobzarev/173/orig -> origin/gh/IvanKobzarev/173/orig 2025-12-04T08:57:43.5646308Z * [new branch] gh/IvanKobzarev/174/base -> origin/gh/IvanKobzarev/174/base 2025-12-04T08:57:43.5647428Z * [new branch] gh/IvanKobzarev/174/head -> origin/gh/IvanKobzarev/174/head 2025-12-04T08:57:43.5648608Z * [new branch] gh/IvanKobzarev/174/orig -> origin/gh/IvanKobzarev/174/orig 2025-12-04T08:57:43.5650109Z * [new branch] gh/IvanKobzarev/175/base -> origin/gh/IvanKobzarev/175/base 2025-12-04T08:57:43.5651261Z * [new branch] gh/IvanKobzarev/175/head -> origin/gh/IvanKobzarev/175/head 2025-12-04T08:57:43.5652339Z * [new branch] gh/IvanKobzarev/175/orig -> origin/gh/IvanKobzarev/175/orig 2025-12-04T08:57:43.5654136Z * [new branch] gh/IvanKobzarev/176/base -> origin/gh/IvanKobzarev/176/base 2025-12-04T08:57:43.5655165Z * [new branch] gh/IvanKobzarev/176/head -> origin/gh/IvanKobzarev/176/head 2025-12-04T08:57:43.5656243Z * [new branch] gh/IvanKobzarev/176/orig -> origin/gh/IvanKobzarev/176/orig 2025-12-04T08:57:43.5658421Z * [new branch] gh/IvanKobzarev/177/base -> origin/gh/IvanKobzarev/177/base 2025-12-04T08:57:43.5659530Z * [new branch] gh/IvanKobzarev/177/head -> origin/gh/IvanKobzarev/177/head 2025-12-04T08:57:43.5660758Z * [new branch] gh/IvanKobzarev/177/orig -> origin/gh/IvanKobzarev/177/orig 2025-12-04T08:57:43.5662435Z * [new branch] gh/IvanKobzarev/178/base -> origin/gh/IvanKobzarev/178/base 2025-12-04T08:57:43.5663568Z * [new branch] gh/IvanKobzarev/178/head -> origin/gh/IvanKobzarev/178/head 2025-12-04T08:57:43.5664708Z * [new branch] gh/IvanKobzarev/178/orig -> origin/gh/IvanKobzarev/178/orig 2025-12-04T08:57:43.5666401Z * [new branch] gh/IvanKobzarev/179/base -> origin/gh/IvanKobzarev/179/base 2025-12-04T08:57:43.5667416Z * [new branch] gh/IvanKobzarev/179/head -> origin/gh/IvanKobzarev/179/head 2025-12-04T08:57:43.5668550Z * [new branch] gh/IvanKobzarev/179/orig -> origin/gh/IvanKobzarev/179/orig 2025-12-04T08:57:43.5670417Z * [new branch] gh/IvanKobzarev/180/base -> origin/gh/IvanKobzarev/180/base 2025-12-04T08:57:43.5671444Z * [new branch] gh/IvanKobzarev/180/head -> origin/gh/IvanKobzarev/180/head 2025-12-04T08:57:43.5672589Z * [new branch] gh/IvanKobzarev/180/orig -> origin/gh/IvanKobzarev/180/orig 2025-12-04T08:57:43.5674390Z * [new branch] gh/IvanKobzarev/181/base -> origin/gh/IvanKobzarev/181/base 2025-12-04T08:57:43.5675512Z * [new branch] gh/IvanKobzarev/181/head -> origin/gh/IvanKobzarev/181/head 2025-12-04T08:57:43.5676644Z * [new branch] gh/IvanKobzarev/181/orig -> origin/gh/IvanKobzarev/181/orig 2025-12-04T08:57:43.5678470Z * [new branch] gh/IvanKobzarev/182/base -> origin/gh/IvanKobzarev/182/base 2025-12-04T08:57:43.5679475Z * [new branch] gh/IvanKobzarev/182/head -> origin/gh/IvanKobzarev/182/head 2025-12-04T08:57:43.5680687Z * [new branch] gh/IvanKobzarev/182/orig -> origin/gh/IvanKobzarev/182/orig 2025-12-04T08:57:43.5682428Z * [new branch] gh/IvanKobzarev/183/base -> origin/gh/IvanKobzarev/183/base 2025-12-04T08:57:43.5683518Z * [new branch] gh/IvanKobzarev/183/head -> origin/gh/IvanKobzarev/183/head 2025-12-04T08:57:43.5684663Z * [new branch] gh/IvanKobzarev/183/orig -> origin/gh/IvanKobzarev/183/orig 2025-12-04T08:57:43.5686242Z * [new branch] gh/IvanKobzarev/184/base -> origin/gh/IvanKobzarev/184/base 2025-12-04T08:57:43.5687309Z * [new branch] gh/IvanKobzarev/184/head -> origin/gh/IvanKobzarev/184/head 2025-12-04T08:57:43.5688435Z * [new branch] gh/IvanKobzarev/184/orig -> origin/gh/IvanKobzarev/184/orig 2025-12-04T08:57:43.5690295Z * [new branch] gh/NikhilAPatel/1/base -> origin/gh/NikhilAPatel/1/base 2025-12-04T08:57:43.5691440Z * [new branch] gh/NikhilAPatel/1/head -> origin/gh/NikhilAPatel/1/head 2025-12-04T08:57:43.5692831Z * [new branch] gh/NikhilAPatel/2/base -> origin/gh/NikhilAPatel/2/base 2025-12-04T08:57:43.5693821Z * [new branch] gh/NikhilAPatel/2/head -> origin/gh/NikhilAPatel/2/head 2025-12-04T08:57:43.5695621Z * [new branch] gh/NikhilAPatel/4/base -> origin/gh/NikhilAPatel/4/base 2025-12-04T08:57:43.5697104Z * [new branch] gh/NikhilAPatel/4/head -> origin/gh/NikhilAPatel/4/head 2025-12-04T08:57:43.5698742Z * [new branch] gh/NikhilAPatel/5/base -> origin/gh/NikhilAPatel/5/base 2025-12-04T08:57:43.5699838Z * [new branch] gh/NikhilAPatel/5/head -> origin/gh/NikhilAPatel/5/head 2025-12-04T08:57:43.5700999Z * [new branch] gh/NikhilAPatel/5/orig -> origin/gh/NikhilAPatel/5/orig 2025-12-04T08:57:43.5702824Z * [new branch] gh/PaliC/17/base -> origin/gh/PaliC/17/base 2025-12-04T08:57:43.5703862Z * [new branch] gh/PaliC/17/head -> origin/gh/PaliC/17/head 2025-12-04T08:57:43.5705008Z * [new branch] gh/PaliC/17/orig -> origin/gh/PaliC/17/orig 2025-12-04T08:57:43.5706609Z * [new branch] gh/PaliC/18/base -> origin/gh/PaliC/18/base 2025-12-04T08:57:43.5707658Z * [new branch] gh/PaliC/18/head -> origin/gh/PaliC/18/head 2025-12-04T08:57:43.5708864Z * [new branch] gh/PaliC/18/orig -> origin/gh/PaliC/18/orig 2025-12-04T08:57:43.5710429Z * [new branch] gh/PaliC/20/base -> origin/gh/PaliC/20/base 2025-12-04T08:57:43.5711563Z * [new branch] gh/PaliC/20/head -> origin/gh/PaliC/20/head 2025-12-04T08:57:43.5712594Z * [new branch] gh/PaliC/20/orig -> origin/gh/PaliC/20/orig 2025-12-04T08:57:43.5714136Z * [new branch] gh/PaliC/21/base -> origin/gh/PaliC/21/base 2025-12-04T08:57:43.5715139Z * [new branch] gh/PaliC/21/head -> origin/gh/PaliC/21/head 2025-12-04T08:57:43.5716356Z * [new branch] gh/PaliC/21/orig -> origin/gh/PaliC/21/orig 2025-12-04T08:57:43.5717635Z * [new branch] gh/PaliC/23/base -> origin/gh/PaliC/23/base 2025-12-04T08:57:43.5718698Z * [new branch] gh/PaliC/23/head -> origin/gh/PaliC/23/head 2025-12-04T08:57:43.5719816Z * [new branch] gh/PaliC/23/orig -> origin/gh/PaliC/23/orig 2025-12-04T08:57:43.5721760Z * [new branch] gh/PaliC/24/base -> origin/gh/PaliC/24/base 2025-12-04T08:57:43.5722860Z * [new branch] gh/PaliC/24/head -> origin/gh/PaliC/24/head 2025-12-04T08:57:43.5723982Z * [new branch] gh/PaliC/24/orig -> origin/gh/PaliC/24/orig 2025-12-04T08:57:43.5725501Z * [new branch] gh/PaliC/25/head -> origin/gh/PaliC/25/head 2025-12-04T08:57:43.5726567Z * [new branch] gh/PaliC/25/next -> origin/gh/PaliC/25/next 2025-12-04T08:57:43.5727727Z * [new branch] gh/PaliC/25/orig -> origin/gh/PaliC/25/orig 2025-12-04T08:57:43.5729271Z * [new branch] gh/PaliC/26/head -> origin/gh/PaliC/26/head 2025-12-04T08:57:43.5730160Z * [new branch] gh/PaliC/26/next -> origin/gh/PaliC/26/next 2025-12-04T08:57:43.5731311Z * [new branch] gh/PaliC/26/orig -> origin/gh/PaliC/26/orig 2025-12-04T08:57:43.5732855Z * [new branch] gh/PaliC/27/next -> origin/gh/PaliC/27/next 2025-12-04T08:57:43.5734455Z * [new branch] gh/PaliC/28/head -> origin/gh/PaliC/28/head 2025-12-04T08:57:43.5735405Z * [new branch] gh/PaliC/28/next -> origin/gh/PaliC/28/next 2025-12-04T08:57:43.5736560Z * [new branch] gh/PaliC/28/orig -> origin/gh/PaliC/28/orig 2025-12-04T08:57:43.5738374Z * [new branch] gh/PaliC/29/head -> origin/gh/PaliC/29/head 2025-12-04T08:57:43.5739286Z * [new branch] gh/PaliC/29/next -> origin/gh/PaliC/29/next 2025-12-04T08:57:43.5740412Z * [new branch] gh/PaliC/29/orig -> origin/gh/PaliC/29/orig 2025-12-04T08:57:43.5741979Z * [new branch] gh/PaliC/30/head -> origin/gh/PaliC/30/head 2025-12-04T08:57:43.5742891Z * [new branch] gh/PaliC/30/next -> origin/gh/PaliC/30/next 2025-12-04T08:57:43.5744036Z * [new branch] gh/PaliC/30/orig -> origin/gh/PaliC/30/orig 2025-12-04T08:57:43.5745538Z * [new branch] gh/PaliC/31/head -> origin/gh/PaliC/31/head 2025-12-04T08:57:43.5746674Z * [new branch] gh/PaliC/31/next -> origin/gh/PaliC/31/next 2025-12-04T08:57:43.5747798Z * [new branch] gh/PaliC/31/orig -> origin/gh/PaliC/31/orig 2025-12-04T08:57:43.5749746Z * [new branch] gh/PaulZhang12/25/base -> origin/gh/PaulZhang12/25/base 2025-12-04T08:57:43.5750891Z * [new branch] gh/PaulZhang12/25/head -> origin/gh/PaulZhang12/25/head 2025-12-04T08:57:43.5752064Z * [new branch] gh/PaulZhang12/25/orig -> origin/gh/PaulZhang12/25/orig 2025-12-04T08:57:43.5753639Z * [new branch] gh/PaulZhang12/28/base -> origin/gh/PaulZhang12/28/base 2025-12-04T08:57:43.5754744Z * [new branch] gh/PaulZhang12/28/head -> origin/gh/PaulZhang12/28/head 2025-12-04T08:57:43.5755843Z * [new branch] gh/PaulZhang12/28/orig -> origin/gh/PaulZhang12/28/orig 2025-12-04T08:57:43.5757598Z * [new branch] gh/PaulZhang12/31/base -> origin/gh/PaulZhang12/31/base 2025-12-04T08:57:43.5758625Z * [new branch] gh/PaulZhang12/31/head -> origin/gh/PaulZhang12/31/head 2025-12-04T08:57:43.5759787Z * [new branch] gh/PaulZhang12/31/orig -> origin/gh/PaulZhang12/31/orig 2025-12-04T08:57:43.5762071Z * [new branch] gh/PaulZhang12/37/base -> origin/gh/PaulZhang12/37/base 2025-12-04T08:57:43.5763053Z * [new branch] gh/PaulZhang12/37/head -> origin/gh/PaulZhang12/37/head 2025-12-04T08:57:43.5763881Z * [new branch] gh/PaulZhang12/37/orig -> origin/gh/PaulZhang12/37/orig 2025-12-04T08:57:43.5764999Z * [new branch] gh/PaulZhang12/40/base -> origin/gh/PaulZhang12/40/base 2025-12-04T08:57:43.5766062Z * [new branch] gh/PaulZhang12/40/head -> origin/gh/PaulZhang12/40/head 2025-12-04T08:57:43.5767151Z * [new branch] gh/PaulZhang12/40/orig -> origin/gh/PaulZhang12/40/orig 2025-12-04T08:57:43.5768734Z * [new branch] gh/PaulZhang12/42/base -> origin/gh/PaulZhang12/42/base 2025-12-04T08:57:43.5769749Z * [new branch] gh/PaulZhang12/42/head -> origin/gh/PaulZhang12/42/head 2025-12-04T08:57:43.5771279Z * [new branch] gh/PaulZhang12/43/base -> origin/gh/PaulZhang12/43/base 2025-12-04T08:57:43.5772371Z * [new branch] gh/PaulZhang12/43/head -> origin/gh/PaulZhang12/43/head 2025-12-04T08:57:43.5773461Z * [new branch] gh/PaulZhang12/43/orig -> origin/gh/PaulZhang12/43/orig 2025-12-04T08:57:43.5774880Z * [new branch] gh/PaulZhang12/44/base -> origin/gh/PaulZhang12/44/base 2025-12-04T08:57:43.5775906Z * [new branch] gh/PaulZhang12/44/head -> origin/gh/PaulZhang12/44/head 2025-12-04T08:57:43.5777910Z * [new branch] gh/PaulZhang12/45/base -> origin/gh/PaulZhang12/45/base 2025-12-04T08:57:43.5778916Z * [new branch] gh/PaulZhang12/45/head -> origin/gh/PaulZhang12/45/head 2025-12-04T08:57:43.5780047Z * [new branch] gh/PaulZhang12/45/orig -> origin/gh/PaulZhang12/45/orig 2025-12-04T08:57:43.5781650Z * [new branch] gh/PaulZhang12/46/base -> origin/gh/PaulZhang12/46/base 2025-12-04T08:57:43.5782801Z * [new branch] gh/PaulZhang12/46/head -> origin/gh/PaulZhang12/46/head 2025-12-04T08:57:43.5783946Z * [new branch] gh/PaulZhang12/46/orig -> origin/gh/PaulZhang12/46/orig 2025-12-04T08:57:43.5785578Z * [new branch] gh/PaulZhang12/47/base -> origin/gh/PaulZhang12/47/base 2025-12-04T08:57:43.5786730Z * [new branch] gh/PaulZhang12/47/head -> origin/gh/PaulZhang12/47/head 2025-12-04T08:57:43.5787873Z * [new branch] gh/PaulZhang12/47/orig -> origin/gh/PaulZhang12/47/orig 2025-12-04T08:57:43.5789370Z * [new branch] gh/PaulZhang12/48/base -> origin/gh/PaulZhang12/48/base 2025-12-04T08:57:43.5790414Z * [new branch] gh/PaulZhang12/48/head -> origin/gh/PaulZhang12/48/head 2025-12-04T08:57:43.5791521Z * [new branch] gh/PaulZhang12/48/orig -> origin/gh/PaulZhang12/48/orig 2025-12-04T08:57:43.5793300Z * [new branch] gh/SamGinzburg/11/base -> origin/gh/SamGinzburg/11/base 2025-12-04T08:57:43.5794361Z * [new branch] gh/SamGinzburg/11/head -> origin/gh/SamGinzburg/11/head 2025-12-04T08:57:43.5796367Z * [new branch] gh/SherlockNoMad/1/base -> origin/gh/SherlockNoMad/1/base 2025-12-04T08:57:43.5797441Z * [new branch] gh/SherlockNoMad/1/head -> origin/gh/SherlockNoMad/1/head 2025-12-04T08:57:43.5799001Z * [new branch] gh/SherlockNoMad/10/base -> origin/gh/SherlockNoMad/10/base 2025-12-04T08:57:43.5800061Z * [new branch] gh/SherlockNoMad/10/head -> origin/gh/SherlockNoMad/10/head 2025-12-04T08:57:43.5801264Z * [new branch] gh/SherlockNoMad/10/orig -> origin/gh/SherlockNoMad/10/orig 2025-12-04T08:57:43.5802663Z * [new branch] gh/SherlockNoMad/11/base -> origin/gh/SherlockNoMad/11/base 2025-12-04T08:57:43.5803693Z * [new branch] gh/SherlockNoMad/11/head -> origin/gh/SherlockNoMad/11/head 2025-12-04T08:57:43.5804804Z * [new branch] gh/SherlockNoMad/11/orig -> origin/gh/SherlockNoMad/11/orig 2025-12-04T08:57:43.5806201Z * [new branch] gh/SherlockNoMad/12/base -> origin/gh/SherlockNoMad/12/base 2025-12-04T08:57:43.5807185Z * [new branch] gh/SherlockNoMad/12/head -> origin/gh/SherlockNoMad/12/head 2025-12-04T08:57:43.5808273Z * [new branch] gh/SherlockNoMad/12/orig -> origin/gh/SherlockNoMad/12/orig 2025-12-04T08:57:43.5809875Z * [new branch] gh/SherlockNoMad/15/base -> origin/gh/SherlockNoMad/15/base 2025-12-04T08:57:43.5811334Z * [new branch] gh/SherlockNoMad/15/head -> origin/gh/SherlockNoMad/15/head 2025-12-04T08:57:43.5812296Z * [new branch] gh/SherlockNoMad/15/orig -> origin/gh/SherlockNoMad/15/orig 2025-12-04T08:57:43.5813886Z * [new branch] gh/SherlockNoMad/17/base -> origin/gh/SherlockNoMad/17/base 2025-12-04T08:57:43.5814916Z * [new branch] gh/SherlockNoMad/17/head -> origin/gh/SherlockNoMad/17/head 2025-12-04T08:57:43.5816017Z * [new branch] gh/SherlockNoMad/17/orig -> origin/gh/SherlockNoMad/17/orig 2025-12-04T08:57:43.5818082Z * [new branch] gh/SherlockNoMad/18/base -> origin/gh/SherlockNoMad/18/base 2025-12-04T08:57:43.5819232Z * [new branch] gh/SherlockNoMad/18/head -> origin/gh/SherlockNoMad/18/head 2025-12-04T08:57:43.5820403Z * [new branch] gh/SherlockNoMad/18/orig -> origin/gh/SherlockNoMad/18/orig 2025-12-04T08:57:43.5822105Z * [new branch] gh/SherlockNoMad/19/base -> origin/gh/SherlockNoMad/19/base 2025-12-04T08:57:43.5823261Z * [new branch] gh/SherlockNoMad/19/head -> origin/gh/SherlockNoMad/19/head 2025-12-04T08:57:43.5824457Z * [new branch] gh/SherlockNoMad/19/orig -> origin/gh/SherlockNoMad/19/orig 2025-12-04T08:57:43.5825901Z * [new branch] gh/SherlockNoMad/2/base -> origin/gh/SherlockNoMad/2/base 2025-12-04T08:57:43.5826876Z * [new branch] gh/SherlockNoMad/2/head -> origin/gh/SherlockNoMad/2/head 2025-12-04T08:57:43.5828295Z * [new branch] gh/SherlockNoMad/20/base -> origin/gh/SherlockNoMad/20/base 2025-12-04T08:57:43.5829468Z * [new branch] gh/SherlockNoMad/20/head -> origin/gh/SherlockNoMad/20/head 2025-12-04T08:57:43.5830487Z * [new branch] gh/SherlockNoMad/20/orig -> origin/gh/SherlockNoMad/20/orig 2025-12-04T08:57:43.5832262Z * [new branch] gh/SherlockNoMad/21/base -> origin/gh/SherlockNoMad/21/base 2025-12-04T08:57:43.5833471Z * [new branch] gh/SherlockNoMad/21/head -> origin/gh/SherlockNoMad/21/head 2025-12-04T08:57:43.5834502Z * [new branch] gh/SherlockNoMad/21/orig -> origin/gh/SherlockNoMad/21/orig 2025-12-04T08:57:43.5835946Z * [new branch] gh/SherlockNoMad/3/base -> origin/gh/SherlockNoMad/3/base 2025-12-04T08:57:43.5836924Z * [new branch] gh/SherlockNoMad/3/head -> origin/gh/SherlockNoMad/3/head 2025-12-04T08:57:43.5838301Z * [new branch] gh/SherlockNoMad/4/base -> origin/gh/SherlockNoMad/4/base 2025-12-04T08:57:43.5839269Z * [new branch] gh/SherlockNoMad/4/head -> origin/gh/SherlockNoMad/4/head 2025-12-04T08:57:43.5840662Z * [new branch] gh/SherlockNoMad/5/base -> origin/gh/SherlockNoMad/5/base 2025-12-04T08:57:43.5841630Z * [new branch] gh/SherlockNoMad/5/head -> origin/gh/SherlockNoMad/5/head 2025-12-04T08:57:43.5843952Z * [new branch] gh/Sidharth123-cpu/24/base -> origin/gh/Sidharth123-cpu/24/base 2025-12-04T08:57:43.5845346Z * [new branch] gh/Sidharth123-cpu/25/base -> origin/gh/Sidharth123-cpu/25/base 2025-12-04T08:57:43.5846639Z * [new branch] gh/Sidharth123-cpu/26/base -> origin/gh/Sidharth123-cpu/26/base 2025-12-04T08:57:43.5848302Z * [new branch] gh/Sidharth123-cpu/27/base -> origin/gh/Sidharth123-cpu/27/base 2025-12-04T08:57:43.5850019Z * [new branch] gh/StrongerXi/1/base -> origin/gh/StrongerXi/1/base 2025-12-04T08:57:43.5851163Z * [new branch] gh/StrongerXi/1/head -> origin/gh/StrongerXi/1/head 2025-12-04T08:57:43.5852596Z * [new branch] gh/StrongerXi/71/base -> origin/gh/StrongerXi/71/base 2025-12-04T08:57:43.5853617Z * [new branch] gh/StrongerXi/71/head -> origin/gh/StrongerXi/71/head 2025-12-04T08:57:43.5855012Z * [new branch] gh/StrongerXi/72/base -> origin/gh/StrongerXi/72/base 2025-12-04T08:57:43.5856059Z * [new branch] gh/StrongerXi/72/head -> origin/gh/StrongerXi/72/head 2025-12-04T08:57:43.5857921Z * [new branch] gh/StrongerXi/73/base -> origin/gh/StrongerXi/73/base 2025-12-04T08:57:43.5858984Z * [new branch] gh/StrongerXi/73/head -> origin/gh/StrongerXi/73/head 2025-12-04T08:57:43.5860168Z * [new branch] gh/StrongerXi/73/orig -> origin/gh/StrongerXi/73/orig 2025-12-04T08:57:43.5862210Z * [new branch] gh/XilunWu/160/base -> origin/gh/XilunWu/160/base 2025-12-04T08:57:43.5863252Z * [new branch] gh/XilunWu/160/head -> origin/gh/XilunWu/160/head 2025-12-04T08:57:43.5864424Z * [new branch] gh/XilunWu/160/orig -> origin/gh/XilunWu/160/orig 2025-12-04T08:57:43.5865999Z * [new branch] gh/XilunWu/163/base -> origin/gh/XilunWu/163/base 2025-12-04T08:57:43.5867083Z * [new branch] gh/XilunWu/163/head -> origin/gh/XilunWu/163/head 2025-12-04T08:57:43.5868203Z * [new branch] gh/XilunWu/163/orig -> origin/gh/XilunWu/163/orig 2025-12-04T08:57:43.5869955Z * [new branch] gh/XilunWu/168/base -> origin/gh/XilunWu/168/base 2025-12-04T08:57:43.5870983Z * [new branch] gh/XilunWu/168/head -> origin/gh/XilunWu/168/head 2025-12-04T08:57:43.5872028Z * [new branch] gh/XilunWu/168/orig -> origin/gh/XilunWu/168/orig 2025-12-04T08:57:43.5873561Z * [new branch] gh/XilunWu/169/base -> origin/gh/XilunWu/169/base 2025-12-04T08:57:43.5874594Z * [new branch] gh/XilunWu/169/head -> origin/gh/XilunWu/169/head 2025-12-04T08:57:43.5875680Z * [new branch] gh/XilunWu/169/orig -> origin/gh/XilunWu/169/orig 2025-12-04T08:57:43.5877082Z * [new branch] gh/XilunWu/170/base -> origin/gh/XilunWu/170/base 2025-12-04T08:57:43.5878281Z * [new branch] gh/XilunWu/170/head -> origin/gh/XilunWu/170/head 2025-12-04T08:57:43.5879389Z * [new branch] gh/XilunWu/170/orig -> origin/gh/XilunWu/170/orig 2025-12-04T08:57:43.5881025Z * [new branch] gh/XilunWu/171/base -> origin/gh/XilunWu/171/base 2025-12-04T08:57:43.5882092Z * [new branch] gh/XilunWu/171/head -> origin/gh/XilunWu/171/head 2025-12-04T08:57:43.5883213Z * [new branch] gh/XilunWu/171/orig -> origin/gh/XilunWu/171/orig 2025-12-04T08:57:43.5884705Z * [new branch] gh/XilunWu/173/base -> origin/gh/XilunWu/173/base 2025-12-04T08:57:43.5885796Z * [new branch] gh/XilunWu/173/head -> origin/gh/XilunWu/173/head 2025-12-04T08:57:43.5886925Z * [new branch] gh/XilunWu/173/orig -> origin/gh/XilunWu/173/orig 2025-12-04T08:57:43.5888450Z * [new branch] gh/XilunWu/175/base -> origin/gh/XilunWu/175/base 2025-12-04T08:57:43.5889515Z * [new branch] gh/XilunWu/175/head -> origin/gh/XilunWu/175/head 2025-12-04T08:57:43.5890619Z * [new branch] gh/XilunWu/175/orig -> origin/gh/XilunWu/175/orig 2025-12-04T08:57:43.5892142Z * [new branch] gh/XilunWu/176/base -> origin/gh/XilunWu/176/base 2025-12-04T08:57:43.5893222Z * [new branch] gh/XilunWu/176/head -> origin/gh/XilunWu/176/head 2025-12-04T08:57:43.5894512Z * [new branch] gh/XilunWu/176/orig -> origin/gh/XilunWu/176/orig 2025-12-04T08:57:43.5897158Z * [new branch] gh/XuehaiPan/14/base -> origin/gh/XuehaiPan/14/base 2025-12-04T08:57:43.5898238Z * [new branch] gh/XuehaiPan/14/head -> origin/gh/XuehaiPan/14/head 2025-12-04T08:57:43.5899359Z * [new branch] gh/XuehaiPan/14/orig -> origin/gh/XuehaiPan/14/orig 2025-12-04T08:57:43.5900988Z * [new branch] gh/XuehaiPan/179/base -> origin/gh/XuehaiPan/179/base 2025-12-04T08:57:43.5902077Z * [new branch] gh/XuehaiPan/179/head -> origin/gh/XuehaiPan/179/head 2025-12-04T08:57:43.5903304Z * [new branch] gh/XuehaiPan/179/orig -> origin/gh/XuehaiPan/179/orig 2025-12-04T08:57:43.5904842Z * [new branch] gh/XuehaiPan/249/base -> origin/gh/XuehaiPan/249/base 2025-12-04T08:57:43.5905926Z * [new branch] gh/XuehaiPan/249/head -> origin/gh/XuehaiPan/249/head 2025-12-04T08:57:43.5907192Z * [new branch] gh/XuehaiPan/249/orig -> origin/gh/XuehaiPan/249/orig 2025-12-04T08:57:43.5908728Z * [new branch] gh/XuehaiPan/253/base -> origin/gh/XuehaiPan/253/base 2025-12-04T08:57:43.5909904Z * [new branch] gh/XuehaiPan/253/head -> origin/gh/XuehaiPan/253/head 2025-12-04T08:57:43.5911035Z * [new branch] gh/XuehaiPan/253/orig -> origin/gh/XuehaiPan/253/orig 2025-12-04T08:57:43.5912600Z * [new branch] gh/XuehaiPan/254/base -> origin/gh/XuehaiPan/254/base 2025-12-04T08:57:43.5913626Z * [new branch] gh/XuehaiPan/254/head -> origin/gh/XuehaiPan/254/head 2025-12-04T08:57:43.5914744Z * [new branch] gh/XuehaiPan/254/orig -> origin/gh/XuehaiPan/254/orig 2025-12-04T08:57:43.5916167Z * [new branch] gh/XuehaiPan/255/base -> origin/gh/XuehaiPan/255/base 2025-12-04T08:57:43.5917176Z * [new branch] gh/XuehaiPan/255/head -> origin/gh/XuehaiPan/255/head 2025-12-04T08:57:43.5918296Z * [new branch] gh/XuehaiPan/255/orig -> origin/gh/XuehaiPan/255/orig 2025-12-04T08:57:43.5919784Z * [new branch] gh/XuehaiPan/271/base -> origin/gh/XuehaiPan/271/base 2025-12-04T08:57:43.5920962Z * [new branch] gh/XuehaiPan/271/head -> origin/gh/XuehaiPan/271/head 2025-12-04T08:57:43.5922395Z * [new branch] gh/XuehaiPan/271/orig -> origin/gh/XuehaiPan/271/orig 2025-12-04T08:57:43.5923930Z * [new branch] gh/XuehaiPan/343/base -> origin/gh/XuehaiPan/343/base 2025-12-04T08:57:43.5925013Z * [new branch] gh/XuehaiPan/343/head -> origin/gh/XuehaiPan/343/head 2025-12-04T08:57:43.5926148Z * [new branch] gh/XuehaiPan/343/orig -> origin/gh/XuehaiPan/343/orig 2025-12-04T08:57:43.5927759Z * [new branch] gh/XuehaiPan/347/base -> origin/gh/XuehaiPan/347/base 2025-12-04T08:57:43.5928833Z * [new branch] gh/XuehaiPan/347/head -> origin/gh/XuehaiPan/347/head 2025-12-04T08:57:43.5929999Z * [new branch] gh/XuehaiPan/347/orig -> origin/gh/XuehaiPan/347/orig 2025-12-04T08:57:43.5931555Z * [new branch] gh/XuehaiPan/348/base -> origin/gh/XuehaiPan/348/base 2025-12-04T08:57:43.5932590Z * [new branch] gh/XuehaiPan/348/head -> origin/gh/XuehaiPan/348/head 2025-12-04T08:57:43.5933806Z * [new branch] gh/XuehaiPan/348/orig -> origin/gh/XuehaiPan/348/orig 2025-12-04T08:57:43.5935284Z * [new branch] gh/XuehaiPan/350/base -> origin/gh/XuehaiPan/350/base 2025-12-04T08:57:43.5936406Z * [new branch] gh/XuehaiPan/350/head -> origin/gh/XuehaiPan/350/head 2025-12-04T08:57:43.5937813Z * [new branch] gh/XuehaiPan/350/orig -> origin/gh/XuehaiPan/350/orig 2025-12-04T08:57:43.5939353Z * [new branch] gh/XuehaiPan/365/base -> origin/gh/XuehaiPan/365/base 2025-12-04T08:57:43.5940462Z * [new branch] gh/XuehaiPan/365/head -> origin/gh/XuehaiPan/365/head 2025-12-04T08:57:43.5941752Z * [new branch] gh/XuehaiPan/365/orig -> origin/gh/XuehaiPan/365/orig 2025-12-04T08:57:43.5943229Z * [new branch] gh/XuehaiPan/366/base -> origin/gh/XuehaiPan/366/base 2025-12-04T08:57:43.5944446Z * [new branch] gh/XuehaiPan/366/head -> origin/gh/XuehaiPan/366/head 2025-12-04T08:57:43.5945990Z * [new branch] gh/XuehaiPan/370/base -> origin/gh/XuehaiPan/370/base 2025-12-04T08:57:43.5947046Z * [new branch] gh/XuehaiPan/370/head -> origin/gh/XuehaiPan/370/head 2025-12-04T08:57:43.5948194Z * [new branch] gh/XuehaiPan/370/orig -> origin/gh/XuehaiPan/370/orig 2025-12-04T08:57:43.5949876Z * [new branch] gh/XuehaiPan/390/base -> origin/gh/XuehaiPan/390/base 2025-12-04T08:57:43.5950890Z * [new branch] gh/XuehaiPan/390/head -> origin/gh/XuehaiPan/390/head 2025-12-04T08:57:43.5951970Z * [new branch] gh/XuehaiPan/390/orig -> origin/gh/XuehaiPan/390/orig 2025-12-04T08:57:43.5953489Z * [new branch] gh/XuehaiPan/391/base -> origin/gh/XuehaiPan/391/base 2025-12-04T08:57:43.5954553Z * [new branch] gh/XuehaiPan/391/head -> origin/gh/XuehaiPan/391/head 2025-12-04T08:57:43.5955608Z * [new branch] gh/XuehaiPan/391/orig -> origin/gh/XuehaiPan/391/orig 2025-12-04T08:57:43.5957177Z * [new branch] gh/XuehaiPan/392/base -> origin/gh/XuehaiPan/392/base 2025-12-04T08:57:43.5958245Z * [new branch] gh/XuehaiPan/392/head -> origin/gh/XuehaiPan/392/head 2025-12-04T08:57:43.5959332Z * [new branch] gh/XuehaiPan/392/orig -> origin/gh/XuehaiPan/392/orig 2025-12-04T08:57:43.5961281Z * [new branch] gh/XuehaiPan/394/base -> origin/gh/XuehaiPan/394/base 2025-12-04T08:57:43.5962317Z * [new branch] gh/XuehaiPan/394/head -> origin/gh/XuehaiPan/394/head 2025-12-04T08:57:43.5963401Z * [new branch] gh/XuehaiPan/394/orig -> origin/gh/XuehaiPan/394/orig 2025-12-04T08:57:43.5964951Z * [new branch] gh/XuehaiPan/397/base -> origin/gh/XuehaiPan/397/base 2025-12-04T08:57:43.5966027Z * [new branch] gh/XuehaiPan/397/head -> origin/gh/XuehaiPan/397/head 2025-12-04T08:57:43.5967145Z * [new branch] gh/XuehaiPan/397/orig -> origin/gh/XuehaiPan/397/orig 2025-12-04T08:57:43.5968729Z * [new branch] gh/XuehaiPan/398/base -> origin/gh/XuehaiPan/398/base 2025-12-04T08:57:43.5969804Z * [new branch] gh/XuehaiPan/398/head -> origin/gh/XuehaiPan/398/head 2025-12-04T08:57:43.5970949Z * [new branch] gh/XuehaiPan/398/orig -> origin/gh/XuehaiPan/398/orig 2025-12-04T08:57:43.5972417Z * [new branch] gh/XuehaiPan/399/base -> origin/gh/XuehaiPan/399/base 2025-12-04T08:57:43.5973479Z * [new branch] gh/XuehaiPan/399/head -> origin/gh/XuehaiPan/399/head 2025-12-04T08:57:43.5974587Z * [new branch] gh/XuehaiPan/399/orig -> origin/gh/XuehaiPan/399/orig 2025-12-04T08:57:43.5976187Z * [new branch] gh/XuehaiPan/400/base -> origin/gh/XuehaiPan/400/base 2025-12-04T08:57:43.5977651Z * [new branch] gh/XuehaiPan/400/head -> origin/gh/XuehaiPan/400/head 2025-12-04T08:57:43.5978786Z * [new branch] gh/XuehaiPan/400/orig -> origin/gh/XuehaiPan/400/orig 2025-12-04T08:57:43.5980647Z * [new branch] gh/ZhiweiYan-96/39/base -> origin/gh/ZhiweiYan-96/39/base 2025-12-04T08:57:43.5981748Z * [new branch] gh/ZhiweiYan-96/39/head -> origin/gh/ZhiweiYan-96/39/head 2025-12-04T08:57:43.5982952Z * [new branch] gh/ZhiweiYan-96/39/orig -> origin/gh/ZhiweiYan-96/39/orig 2025-12-04T08:57:43.5984457Z * [new branch] gh/ZhiweiYan-96/44/base -> origin/gh/ZhiweiYan-96/44/base 2025-12-04T08:57:43.5985616Z * [new branch] gh/ZhiweiYan-96/44/head -> origin/gh/ZhiweiYan-96/44/head 2025-12-04T08:57:43.5987036Z * [new branch] gh/ZhiweiYan-96/45/base -> origin/gh/ZhiweiYan-96/45/base 2025-12-04T08:57:43.5988010Z * [new branch] gh/ZhiweiYan-96/45/head -> origin/gh/ZhiweiYan-96/45/head 2025-12-04T08:57:43.5989736Z * [new branch] gh/ZhiweiYan-96/49/base -> origin/gh/ZhiweiYan-96/49/base 2025-12-04T08:57:43.5990794Z * [new branch] gh/ZhiweiYan-96/49/head -> origin/gh/ZhiweiYan-96/49/head 2025-12-04T08:57:43.5992343Z * [new branch] gh/ZhiweiYan-96/62/base -> origin/gh/ZhiweiYan-96/62/base 2025-12-04T08:57:43.5993376Z * [new branch] gh/ZhiweiYan-96/62/head -> origin/gh/ZhiweiYan-96/62/head 2025-12-04T08:57:43.5995144Z * [new branch] gh/ZhiweiYan-96/66/base -> origin/gh/ZhiweiYan-96/66/base 2025-12-04T08:57:43.5995999Z * [new branch] gh/ZhiweiYan-96/66/head -> origin/gh/ZhiweiYan-96/66/head 2025-12-04T08:57:43.5997414Z * [new branch] gh/ZhiweiYan-96/67/base -> origin/gh/ZhiweiYan-96/67/base 2025-12-04T08:57:43.5998428Z * [new branch] gh/ZhiweiYan-96/67/head -> origin/gh/ZhiweiYan-96/67/head 2025-12-04T08:57:43.5999827Z * [new branch] gh/ZhiweiYan-96/68/base -> origin/gh/ZhiweiYan-96/68/base 2025-12-04T08:57:43.6000792Z * [new branch] gh/ZhiweiYan-96/68/head -> origin/gh/ZhiweiYan-96/68/head 2025-12-04T08:57:43.6001862Z * [new branch] gh/ZhiweiYan-96/68/orig -> origin/gh/ZhiweiYan-96/68/orig 2025-12-04T08:57:43.6003687Z * [new branch] gh/aakhundov/1/base -> origin/gh/aakhundov/1/base 2025-12-04T08:57:43.6004799Z * [new branch] gh/aakhundov/1/head -> origin/gh/aakhundov/1/head 2025-12-04T08:57:43.6006194Z * [new branch] gh/aakhundov/2/base -> origin/gh/aakhundov/2/base 2025-12-04T08:57:43.6007304Z * [new branch] gh/aakhundov/2/head -> origin/gh/aakhundov/2/head 2025-12-04T08:57:43.6008858Z * [new branch] gh/aditew01/openblas -> origin/gh/aditew01/openblas 2025-12-04T08:57:43.6010273Z * [new branch] gh/aditew01/sbgemm -> origin/gh/aditew01/sbgemm 2025-12-04T08:57:43.6011325Z * [new branch] gh/aditew01/vecbf16 -> origin/gh/aditew01/vecbf16 2025-12-04T08:57:43.6013082Z * [new branch] gh/albanD/4/base -> origin/gh/albanD/4/base 2025-12-04T08:57:43.6014098Z * [new branch] gh/albanD/4/head -> origin/gh/albanD/4/head 2025-12-04T08:57:43.6015184Z * [new branch] gh/albanD/4/orig -> origin/gh/albanD/4/orig 2025-12-04T08:57:43.6017426Z * [new branch] gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init 2025-12-04T08:57:43.6018957Z * [new branch] gh/alexsamardzic/12/base -> origin/gh/alexsamardzic/12/base 2025-12-04T08:57:43.6020101Z * [new branch] gh/alexsamardzic/12/head -> origin/gh/alexsamardzic/12/head 2025-12-04T08:57:43.6021423Z * [new branch] gh/alexsamardzic/12/orig -> origin/gh/alexsamardzic/12/orig 2025-12-04T08:57:43.6023107Z * [new branch] gh/alexsamardzic/14/base -> origin/gh/alexsamardzic/14/base 2025-12-04T08:57:43.6024184Z * [new branch] gh/alexsamardzic/14/head -> origin/gh/alexsamardzic/14/head 2025-12-04T08:57:43.6025343Z * [new branch] gh/alexsamardzic/14/orig -> origin/gh/alexsamardzic/14/orig 2025-12-04T08:57:43.6026948Z * [new branch] gh/alexsamardzic/15/base -> origin/gh/alexsamardzic/15/base 2025-12-04T08:57:43.6028029Z * [new branch] gh/alexsamardzic/15/head -> origin/gh/alexsamardzic/15/head 2025-12-04T08:57:43.6029149Z * [new branch] gh/alexsamardzic/15/orig -> origin/gh/alexsamardzic/15/orig 2025-12-04T08:57:43.6031119Z * [new branch] gh/amjames/18/base -> origin/gh/amjames/18/base 2025-12-04T08:57:43.6032056Z * [new branch] gh/amjames/18/head -> origin/gh/amjames/18/head 2025-12-04T08:57:43.6033278Z * [new branch] gh/amjames/18/orig -> origin/gh/amjames/18/orig 2025-12-04T08:57:43.6035354Z * [new branch] gh/andrewor14/35/base -> origin/gh/andrewor14/35/base 2025-12-04T08:57:43.6036454Z * [new branch] gh/andrewor14/35/head -> origin/gh/andrewor14/35/head 2025-12-04T08:57:43.6037597Z * [new branch] gh/andrewor14/35/orig -> origin/gh/andrewor14/35/orig 2025-12-04T08:57:43.6039262Z * [new branch] gh/andrewor14/50/base -> origin/gh/andrewor14/50/base 2025-12-04T08:57:43.6040471Z * [new branch] gh/andrewor14/50/head -> origin/gh/andrewor14/50/head 2025-12-04T08:57:43.6041625Z * [new branch] gh/andrewor14/50/orig -> origin/gh/andrewor14/50/orig 2025-12-04T08:57:43.6043514Z * [new branch] gh/andyanwang/30/base -> origin/gh/andyanwang/30/base 2025-12-04T08:57:43.6044778Z * [new branch] gh/andyanwang/30/orig -> origin/gh/andyanwang/30/orig 2025-12-04T08:57:43.6046382Z * [new branch] gh/andyanwang/31/base -> origin/gh/andyanwang/31/base 2025-12-04T08:57:43.6047616Z * [new branch] gh/andyanwang/31/orig -> origin/gh/andyanwang/31/orig 2025-12-04T08:57:43.6049148Z * [new branch] gh/andyanwang/39/base -> origin/gh/andyanwang/39/base 2025-12-04T08:57:43.6050265Z * [new branch] gh/andyanwang/39/head -> origin/gh/andyanwang/39/head 2025-12-04T08:57:43.6051438Z * [new branch] gh/andyanwang/39/orig -> origin/gh/andyanwang/39/orig 2025-12-04T08:57:43.6053175Z * [new branch] gh/andyanwang/42/base -> origin/gh/andyanwang/42/base 2025-12-04T08:57:43.6054147Z * [new branch] gh/andyanwang/42/head -> origin/gh/andyanwang/42/head 2025-12-04T08:57:43.6055249Z * [new branch] gh/andyanwang/42/orig -> origin/gh/andyanwang/42/orig 2025-12-04T08:57:43.6057217Z * [new branch] gh/andyanwang/45/base -> origin/gh/andyanwang/45/base 2025-12-04T08:57:43.6058403Z * [new branch] gh/andyanwang/45/head -> origin/gh/andyanwang/45/head 2025-12-04T08:57:43.6059572Z * [new branch] gh/andyanwang/45/orig -> origin/gh/andyanwang/45/orig 2025-12-04T08:57:43.6061402Z * [new branch] gh/angelayi/107/base -> origin/gh/angelayi/107/base 2025-12-04T08:57:43.6062462Z * [new branch] gh/angelayi/107/head -> origin/gh/angelayi/107/head 2025-12-04T08:57:43.6064065Z * [new branch] gh/angelayi/114/base -> origin/gh/angelayi/114/base 2025-12-04T08:57:43.6065232Z * [new branch] gh/angelayi/114/head -> origin/gh/angelayi/114/head 2025-12-04T08:57:43.6066391Z * [new branch] gh/angelayi/114/orig -> origin/gh/angelayi/114/orig 2025-12-04T08:57:43.6067965Z * [new branch] gh/angelayi/116/base -> origin/gh/angelayi/116/base 2025-12-04T08:57:43.6069133Z * [new branch] gh/angelayi/116/head -> origin/gh/angelayi/116/head 2025-12-04T08:57:43.6070235Z * [new branch] gh/angelayi/116/orig -> origin/gh/angelayi/116/orig 2025-12-04T08:57:43.6071865Z * [new branch] gh/angelayi/122/base -> origin/gh/angelayi/122/base 2025-12-04T08:57:43.6072827Z * [new branch] gh/angelayi/122/head -> origin/gh/angelayi/122/head 2025-12-04T08:57:43.6073943Z * [new branch] gh/angelayi/122/orig -> origin/gh/angelayi/122/orig 2025-12-04T08:57:43.6075624Z * [new branch] gh/angelayi/124/base -> origin/gh/angelayi/124/base 2025-12-04T08:57:43.6076622Z * [new branch] gh/angelayi/124/head -> origin/gh/angelayi/124/head 2025-12-04T08:57:43.6077831Z * [new branch] gh/angelayi/124/orig -> origin/gh/angelayi/124/orig 2025-12-04T08:57:43.6079517Z * [new branch] gh/angelayi/128/base -> origin/gh/angelayi/128/base 2025-12-04T08:57:43.6080650Z * [new branch] gh/angelayi/128/head -> origin/gh/angelayi/128/head 2025-12-04T08:57:43.6081739Z * [new branch] gh/angelayi/128/orig -> origin/gh/angelayi/128/orig 2025-12-04T08:57:43.6083268Z * [new branch] gh/angelayi/131/base -> origin/gh/angelayi/131/base 2025-12-04T08:57:43.6084320Z * [new branch] gh/angelayi/131/head -> origin/gh/angelayi/131/head 2025-12-04T08:57:43.6085444Z * [new branch] gh/angelayi/131/orig -> origin/gh/angelayi/131/orig 2025-12-04T08:57:43.6087178Z * [new branch] gh/angelayi/132/base -> origin/gh/angelayi/132/base 2025-12-04T08:57:43.6088545Z * [new branch] gh/angelayi/132/head -> origin/gh/angelayi/132/head 2025-12-04T08:57:43.6089729Z * [new branch] gh/angelayi/132/orig -> origin/gh/angelayi/132/orig 2025-12-04T08:57:43.6091214Z * [new branch] gh/angelayi/133/base -> origin/gh/angelayi/133/base 2025-12-04T08:57:43.6092279Z * [new branch] gh/angelayi/133/head -> origin/gh/angelayi/133/head 2025-12-04T08:57:43.6093372Z * [new branch] gh/angelayi/133/orig -> origin/gh/angelayi/133/orig 2025-12-04T08:57:43.6095170Z * [new branch] gh/angelayi/134/base -> origin/gh/angelayi/134/base 2025-12-04T08:57:43.6096587Z * [new branch] gh/angelayi/134/head -> origin/gh/angelayi/134/head 2025-12-04T08:57:43.6097977Z * [new branch] gh/angelayi/134/orig -> origin/gh/angelayi/134/orig 2025-12-04T08:57:43.6099802Z * [new branch] gh/angelayi/135/base -> origin/gh/angelayi/135/base 2025-12-04T08:57:43.6100988Z * [new branch] gh/angelayi/135/head -> origin/gh/angelayi/135/head 2025-12-04T08:57:43.6102138Z * [new branch] gh/angelayi/135/orig -> origin/gh/angelayi/135/orig 2025-12-04T08:57:43.6103699Z * [new branch] gh/angelayi/136/base -> origin/gh/angelayi/136/base 2025-12-04T08:57:43.6104784Z * [new branch] gh/angelayi/136/head -> origin/gh/angelayi/136/head 2025-12-04T08:57:43.6105922Z * [new branch] gh/angelayi/136/orig -> origin/gh/angelayi/136/orig 2025-12-04T08:57:43.6107482Z * [new branch] gh/angelayi/137/base -> origin/gh/angelayi/137/base 2025-12-04T08:57:43.6108516Z * [new branch] gh/angelayi/137/head -> origin/gh/angelayi/137/head 2025-12-04T08:57:43.6110086Z * [new branch] gh/angelayi/137/orig -> origin/gh/angelayi/137/orig 2025-12-04T08:57:43.6111484Z * [new branch] gh/angelayi/138/base -> origin/gh/angelayi/138/base 2025-12-04T08:57:43.6112465Z * [new branch] gh/angelayi/138/head -> origin/gh/angelayi/138/head 2025-12-04T08:57:43.6113539Z * [new branch] gh/angelayi/138/orig -> origin/gh/angelayi/138/orig 2025-12-04T08:57:43.6115073Z * [new branch] gh/angelayi/139/base -> origin/gh/angelayi/139/base 2025-12-04T08:57:43.6116110Z * [new branch] gh/angelayi/139/head -> origin/gh/angelayi/139/head 2025-12-04T08:57:43.6117197Z * [new branch] gh/angelayi/139/orig -> origin/gh/angelayi/139/orig 2025-12-04T08:57:43.6118811Z * [new branch] gh/angelayi/140/base -> origin/gh/angelayi/140/base 2025-12-04T08:57:43.6119933Z * [new branch] gh/angelayi/140/head -> origin/gh/angelayi/140/head 2025-12-04T08:57:43.6121362Z * [new branch] gh/angelayi/140/orig -> origin/gh/angelayi/140/orig 2025-12-04T08:57:43.6126621Z * [new branch] gh/angelayi/141/base -> origin/gh/angelayi/141/base 2025-12-04T08:57:43.6127919Z * [new branch] gh/angelayi/141/head -> origin/gh/angelayi/141/head 2025-12-04T08:57:43.6128945Z * [new branch] gh/angelayi/141/orig -> origin/gh/angelayi/141/orig 2025-12-04T08:57:43.6130589Z * [new branch] gh/angelayi/142/base -> origin/gh/angelayi/142/base 2025-12-04T08:57:43.6131649Z * [new branch] gh/angelayi/142/head -> origin/gh/angelayi/142/head 2025-12-04T08:57:43.6132777Z * [new branch] gh/angelayi/142/orig -> origin/gh/angelayi/142/orig 2025-12-04T08:57:43.6134512Z * [new branch] gh/angelayi/143/base -> origin/gh/angelayi/143/base 2025-12-04T08:57:43.6135552Z * [new branch] gh/angelayi/143/head -> origin/gh/angelayi/143/head 2025-12-04T08:57:43.6136881Z * [new branch] gh/angelayi/143/orig -> origin/gh/angelayi/143/orig 2025-12-04T08:57:43.6138618Z * [new branch] gh/angelayi/144/base -> origin/gh/angelayi/144/base 2025-12-04T08:57:43.6139808Z * [new branch] gh/angelayi/144/head -> origin/gh/angelayi/144/head 2025-12-04T08:57:43.6140995Z * [new branch] gh/angelayi/144/orig -> origin/gh/angelayi/144/orig 2025-12-04T08:57:43.6143046Z * [new branch] gh/anijain2305/753/base -> origin/gh/anijain2305/753/base 2025-12-04T08:57:43.6144127Z * [new branch] gh/anijain2305/753/head -> origin/gh/anijain2305/753/head 2025-12-04T08:57:43.6145265Z * [new branch] gh/anijain2305/753/orig -> origin/gh/anijain2305/753/orig 2025-12-04T08:57:43.6146977Z * [new branch] gh/anijain2305/810/base -> origin/gh/anijain2305/810/base 2025-12-04T08:57:43.6148048Z * [new branch] gh/anijain2305/810/head -> origin/gh/anijain2305/810/head 2025-12-04T08:57:43.6149384Z * [new branch] gh/anijain2305/810/orig -> origin/gh/anijain2305/810/orig 2025-12-04T08:57:43.6150982Z * [new branch] gh/anijain2305/854/base -> origin/gh/anijain2305/854/base 2025-12-04T08:57:43.6152109Z * [new branch] gh/anijain2305/854/head -> origin/gh/anijain2305/854/head 2025-12-04T08:57:43.6153208Z * [new branch] gh/anijain2305/854/orig -> origin/gh/anijain2305/854/orig 2025-12-04T08:57:43.6154847Z * [new branch] gh/anijain2305/864/base -> origin/gh/anijain2305/864/base 2025-12-04T08:57:43.6155889Z * [new branch] gh/anijain2305/864/head -> origin/gh/anijain2305/864/head 2025-12-04T08:57:43.6157001Z * [new branch] gh/anijain2305/864/orig -> origin/gh/anijain2305/864/orig 2025-12-04T08:57:43.6158589Z * [new branch] gh/anijain2305/870/base -> origin/gh/anijain2305/870/base 2025-12-04T08:57:43.6159566Z * [new branch] gh/anijain2305/870/head -> origin/gh/anijain2305/870/head 2025-12-04T08:57:43.6160625Z * [new branch] gh/anijain2305/870/orig -> origin/gh/anijain2305/870/orig 2025-12-04T08:57:43.6162306Z * [new branch] gh/anijain2305/873/base -> origin/gh/anijain2305/873/base 2025-12-04T08:57:43.6163262Z * [new branch] gh/anijain2305/873/head -> origin/gh/anijain2305/873/head 2025-12-04T08:57:43.6164320Z * [new branch] gh/anijain2305/873/orig -> origin/gh/anijain2305/873/orig 2025-12-04T08:57:43.6165830Z * [new branch] gh/anijain2305/894/base -> origin/gh/anijain2305/894/base 2025-12-04T08:57:43.6166859Z * [new branch] gh/anijain2305/894/head -> origin/gh/anijain2305/894/head 2025-12-04T08:57:43.6167951Z * [new branch] gh/anijain2305/894/orig -> origin/gh/anijain2305/894/orig 2025-12-04T08:57:43.6169506Z * [new branch] gh/anijain2305/895/base -> origin/gh/anijain2305/895/base 2025-12-04T08:57:43.6170596Z * [new branch] gh/anijain2305/895/head -> origin/gh/anijain2305/895/head 2025-12-04T08:57:43.6171695Z * [new branch] gh/anijain2305/895/orig -> origin/gh/anijain2305/895/orig 2025-12-04T08:57:43.6173352Z * [new branch] gh/anijain2305/910/base -> origin/gh/anijain2305/910/base 2025-12-04T08:57:43.6174358Z * [new branch] gh/anijain2305/910/head -> origin/gh/anijain2305/910/head 2025-12-04T08:57:43.6175468Z * [new branch] gh/anijain2305/910/orig -> origin/gh/anijain2305/910/orig 2025-12-04T08:57:43.6177516Z * [new branch] gh/anijain2305/919/base -> origin/gh/anijain2305/919/base 2025-12-04T08:57:43.6178636Z * [new branch] gh/anijain2305/919/head -> origin/gh/anijain2305/919/head 2025-12-04T08:57:43.6179772Z * [new branch] gh/anijain2305/919/orig -> origin/gh/anijain2305/919/orig 2025-12-04T08:57:43.6181361Z * [new branch] gh/anijain2305/922/base -> origin/gh/anijain2305/922/base 2025-12-04T08:57:43.6182474Z * [new branch] gh/anijain2305/922/head -> origin/gh/anijain2305/922/head 2025-12-04T08:57:43.6183623Z * [new branch] gh/anijain2305/922/orig -> origin/gh/anijain2305/922/orig 2025-12-04T08:57:43.6185215Z * [new branch] gh/anijain2305/932/base -> origin/gh/anijain2305/932/base 2025-12-04T08:57:43.6186479Z * [new branch] gh/anijain2305/932/head -> origin/gh/anijain2305/932/head 2025-12-04T08:57:43.6187724Z * [new branch] gh/anijain2305/932/orig -> origin/gh/anijain2305/932/orig 2025-12-04T08:57:43.6189382Z * [new branch] gh/anijain2305/940/base -> origin/gh/anijain2305/940/base 2025-12-04T08:57:43.6190418Z * [new branch] gh/anijain2305/940/head -> origin/gh/anijain2305/940/head 2025-12-04T08:57:43.6191517Z * [new branch] gh/anijain2305/940/orig -> origin/gh/anijain2305/940/orig 2025-12-04T08:57:43.6193070Z * [new branch] gh/anijain2305/941/base -> origin/gh/anijain2305/941/base 2025-12-04T08:57:43.6194103Z * [new branch] gh/anijain2305/941/head -> origin/gh/anijain2305/941/head 2025-12-04T08:57:43.6195254Z * [new branch] gh/anijain2305/941/orig -> origin/gh/anijain2305/941/orig 2025-12-04T08:57:43.6196731Z * [new branch] gh/anijain2305/942/base -> origin/gh/anijain2305/942/base 2025-12-04T08:57:43.6197865Z * [new branch] gh/anijain2305/942/head -> origin/gh/anijain2305/942/head 2025-12-04T08:57:43.6199036Z * [new branch] gh/anijain2305/942/orig -> origin/gh/anijain2305/942/orig 2025-12-04T08:57:43.6200607Z * [new branch] gh/anijain2305/943/base -> origin/gh/anijain2305/943/base 2025-12-04T08:57:43.6201641Z * [new branch] gh/anijain2305/943/head -> origin/gh/anijain2305/943/head 2025-12-04T08:57:43.6202752Z * [new branch] gh/anijain2305/943/orig -> origin/gh/anijain2305/943/orig 2025-12-04T08:57:43.6204826Z * [new branch] gh/anijain2305/944/base -> origin/gh/anijain2305/944/base 2025-12-04T08:57:43.6205857Z * [new branch] gh/anijain2305/944/head -> origin/gh/anijain2305/944/head 2025-12-04T08:57:43.6206932Z * [new branch] gh/anijain2305/944/orig -> origin/gh/anijain2305/944/orig 2025-12-04T08:57:43.6209227Z * [new branch] gh/anijain2305/945/base -> origin/gh/anijain2305/945/base 2025-12-04T08:57:43.6210382Z * [new branch] gh/anijain2305/945/head -> origin/gh/anijain2305/945/head 2025-12-04T08:57:43.6211475Z * [new branch] gh/anijain2305/945/orig -> origin/gh/anijain2305/945/orig 2025-12-04T08:57:43.6213079Z * [new branch] gh/anijain2305/946/base -> origin/gh/anijain2305/946/base 2025-12-04T08:57:43.6214131Z * [new branch] gh/anijain2305/946/head -> origin/gh/anijain2305/946/head 2025-12-04T08:57:43.6215367Z * [new branch] gh/anijain2305/946/orig -> origin/gh/anijain2305/946/orig 2025-12-04T08:57:43.6217226Z * [new branch] gh/anijain2305/947/base -> origin/gh/anijain2305/947/base 2025-12-04T08:57:43.6218408Z * [new branch] gh/anijain2305/947/head -> origin/gh/anijain2305/947/head 2025-12-04T08:57:43.6219465Z * [new branch] gh/anijain2305/947/orig -> origin/gh/anijain2305/947/orig 2025-12-04T08:57:43.6221373Z * [new branch] gh/anijain2305/948/base -> origin/gh/anijain2305/948/base 2025-12-04T08:57:43.6222467Z * [new branch] gh/anijain2305/948/head -> origin/gh/anijain2305/948/head 2025-12-04T08:57:43.6223589Z * [new branch] gh/anijain2305/948/orig -> origin/gh/anijain2305/948/orig 2025-12-04T08:57:43.6225195Z * [new branch] gh/anijain2305/949/base -> origin/gh/anijain2305/949/base 2025-12-04T08:57:43.6226285Z * [new branch] gh/anijain2305/949/head -> origin/gh/anijain2305/949/head 2025-12-04T08:57:43.6227470Z * [new branch] gh/anijain2305/949/orig -> origin/gh/anijain2305/949/orig 2025-12-04T08:57:43.6229106Z * [new branch] gh/anijain2305/950/base -> origin/gh/anijain2305/950/base 2025-12-04T08:57:43.6230231Z * [new branch] gh/anijain2305/950/head -> origin/gh/anijain2305/950/head 2025-12-04T08:57:43.6231340Z * [new branch] gh/anijain2305/950/orig -> origin/gh/anijain2305/950/orig 2025-12-04T08:57:43.6233134Z * [new branch] gh/anijain2305/951/base -> origin/gh/anijain2305/951/base 2025-12-04T08:57:43.6234165Z * [new branch] gh/anijain2305/951/head -> origin/gh/anijain2305/951/head 2025-12-04T08:57:43.6235247Z * [new branch] gh/anijain2305/951/orig -> origin/gh/anijain2305/951/orig 2025-12-04T08:57:43.6236846Z * [new branch] gh/anijain2305/952/base -> origin/gh/anijain2305/952/base 2025-12-04T08:57:43.6237877Z * [new branch] gh/anijain2305/952/head -> origin/gh/anijain2305/952/head 2025-12-04T08:57:43.6238962Z * [new branch] gh/anijain2305/952/orig -> origin/gh/anijain2305/952/orig 2025-12-04T08:57:43.6240498Z * [new branch] gh/anijain2305/953/base -> origin/gh/anijain2305/953/base 2025-12-04T08:57:43.6241504Z * [new branch] gh/anijain2305/953/head -> origin/gh/anijain2305/953/head 2025-12-04T08:57:43.6242611Z * [new branch] gh/anijain2305/953/orig -> origin/gh/anijain2305/953/orig 2025-12-04T08:57:43.6244190Z * [new branch] gh/anijain2305/954/base -> origin/gh/anijain2305/954/base 2025-12-04T08:57:43.6245305Z * [new branch] gh/anijain2305/954/head -> origin/gh/anijain2305/954/head 2025-12-04T08:57:43.6246950Z * [new branch] gh/anijain2305/954/orig -> origin/gh/anijain2305/954/orig 2025-12-04T08:57:43.6248740Z * [new branch] gh/anijain2305/955/base -> origin/gh/anijain2305/955/base 2025-12-04T08:57:43.6249641Z * [new branch] gh/anijain2305/955/head -> origin/gh/anijain2305/955/head 2025-12-04T08:57:43.6250737Z * [new branch] gh/anijain2305/955/orig -> origin/gh/anijain2305/955/orig 2025-12-04T08:57:43.6252404Z * [new branch] gh/anijain2305/956/base -> origin/gh/anijain2305/956/base 2025-12-04T08:57:43.6253459Z * [new branch] gh/anijain2305/956/head -> origin/gh/anijain2305/956/head 2025-12-04T08:57:43.6254589Z * [new branch] gh/anijain2305/956/orig -> origin/gh/anijain2305/956/orig 2025-12-04T08:57:43.6256256Z * [new branch] gh/anijain2305/957/base -> origin/gh/anijain2305/957/base 2025-12-04T08:57:43.6257677Z * [new branch] gh/anijain2305/957/head -> origin/gh/anijain2305/957/head 2025-12-04T08:57:43.6258824Z * [new branch] gh/anijain2305/957/orig -> origin/gh/anijain2305/957/orig 2025-12-04T08:57:43.6260482Z * [new branch] gh/anijain2305/958/base -> origin/gh/anijain2305/958/base 2025-12-04T08:57:43.6261610Z * [new branch] gh/anijain2305/958/head -> origin/gh/anijain2305/958/head 2025-12-04T08:57:43.6262833Z * [new branch] gh/anijain2305/958/orig -> origin/gh/anijain2305/958/orig 2025-12-04T08:57:43.6264315Z * [new branch] gh/anijain2305/959/base -> origin/gh/anijain2305/959/base 2025-12-04T08:57:43.6265388Z * [new branch] gh/anijain2305/959/head -> origin/gh/anijain2305/959/head 2025-12-04T08:57:43.6266530Z * [new branch] gh/anijain2305/959/orig -> origin/gh/anijain2305/959/orig 2025-12-04T08:57:43.6268270Z * [new branch] gh/anijain2305/960/base -> origin/gh/anijain2305/960/base 2025-12-04T08:57:43.6269477Z * [new branch] gh/anijain2305/960/head -> origin/gh/anijain2305/960/head 2025-12-04T08:57:43.6270601Z * [new branch] gh/anijain2305/960/orig -> origin/gh/anijain2305/960/orig 2025-12-04T08:57:43.6272269Z * [new branch] gh/anijain2305/961/base -> origin/gh/anijain2305/961/base 2025-12-04T08:57:43.6273302Z * [new branch] gh/anijain2305/961/head -> origin/gh/anijain2305/961/head 2025-12-04T08:57:43.6274389Z * [new branch] gh/anijain2305/961/orig -> origin/gh/anijain2305/961/orig 2025-12-04T08:57:43.6276094Z * [new branch] gh/anijain2305/962/base -> origin/gh/anijain2305/962/base 2025-12-04T08:57:43.6277107Z * [new branch] gh/anijain2305/962/head -> origin/gh/anijain2305/962/head 2025-12-04T08:57:43.6278203Z * [new branch] gh/anijain2305/962/orig -> origin/gh/anijain2305/962/orig 2025-12-04T08:57:43.6280151Z * [new branch] gh/anijain2305/963/base -> origin/gh/anijain2305/963/base 2025-12-04T08:57:43.6281370Z * [new branch] gh/anijain2305/963/head -> origin/gh/anijain2305/963/head 2025-12-04T08:57:43.6282726Z * [new branch] gh/anijain2305/963/orig -> origin/gh/anijain2305/963/orig 2025-12-04T08:57:43.6284338Z * [new branch] gh/anijain2305/964/base -> origin/gh/anijain2305/964/base 2025-12-04T08:57:43.6285416Z * [new branch] gh/anijain2305/964/head -> origin/gh/anijain2305/964/head 2025-12-04T08:57:43.6286482Z * [new branch] gh/anijain2305/964/orig -> origin/gh/anijain2305/964/orig 2025-12-04T08:57:43.6288453Z * [new branch] gh/anijain2305/965/base -> origin/gh/anijain2305/965/base 2025-12-04T08:57:43.6289537Z * [new branch] gh/anijain2305/965/head -> origin/gh/anijain2305/965/head 2025-12-04T08:57:43.6290713Z * [new branch] gh/anijain2305/965/orig -> origin/gh/anijain2305/965/orig 2025-12-04T08:57:43.6292233Z * [new branch] gh/anijain2305/966/base -> origin/gh/anijain2305/966/base 2025-12-04T08:57:43.6293283Z * [new branch] gh/anijain2305/966/head -> origin/gh/anijain2305/966/head 2025-12-04T08:57:43.6294373Z * [new branch] gh/anijain2305/966/orig -> origin/gh/anijain2305/966/orig 2025-12-04T08:57:43.6295954Z * [new branch] gh/anijain2305/967/base -> origin/gh/anijain2305/967/base 2025-12-04T08:57:43.6297331Z * [new branch] gh/anijain2305/967/head -> origin/gh/anijain2305/967/head 2025-12-04T08:57:43.6298605Z * [new branch] gh/anijain2305/967/orig -> origin/gh/anijain2305/967/orig 2025-12-04T08:57:43.6300222Z * [new branch] gh/anijain2305/968/base -> origin/gh/anijain2305/968/base 2025-12-04T08:57:43.6301333Z * [new branch] gh/anijain2305/968/head -> origin/gh/anijain2305/968/head 2025-12-04T08:57:43.6302467Z * [new branch] gh/anijain2305/968/orig -> origin/gh/anijain2305/968/orig 2025-12-04T08:57:43.6304029Z * [new branch] gh/anijain2305/969/base -> origin/gh/anijain2305/969/base 2025-12-04T08:57:43.6305143Z * [new branch] gh/anijain2305/969/head -> origin/gh/anijain2305/969/head 2025-12-04T08:57:43.6306343Z * [new branch] gh/anijain2305/969/orig -> origin/gh/anijain2305/969/orig 2025-12-04T08:57:43.6308198Z * [new branch] gh/anijain2305/970/base -> origin/gh/anijain2305/970/base 2025-12-04T08:57:43.6309357Z * [new branch] gh/anijain2305/970/head -> origin/gh/anijain2305/970/head 2025-12-04T08:57:43.6310521Z * [new branch] gh/anijain2305/970/orig -> origin/gh/anijain2305/970/orig 2025-12-04T08:57:43.6312355Z * [new branch] gh/anjali411/216/base -> origin/gh/anjali411/216/base 2025-12-04T08:57:43.6313419Z * [new branch] gh/anjali411/216/head -> origin/gh/anjali411/216/head 2025-12-04T08:57:43.6314525Z * [new branch] gh/anjali411/216/orig -> origin/gh/anjali411/216/orig 2025-12-04T08:57:43.6316487Z * [new branch] gh/anshul-si/1/base -> origin/gh/anshul-si/1/base 2025-12-04T08:57:43.6317573Z * [new branch] gh/anshul-si/1/head -> origin/gh/anshul-si/1/head 2025-12-04T08:57:43.6318953Z * [new branch] gh/anshul-si/2/base -> origin/gh/anshul-si/2/base 2025-12-04T08:57:43.6320002Z * [new branch] gh/anshul-si/2/head -> origin/gh/anshul-si/2/head 2025-12-04T08:57:43.6321817Z * [new branch] gh/anshul-si/3/base -> origin/gh/anshul-si/3/base 2025-12-04T08:57:43.6322862Z * [new branch] gh/anshul-si/3/head -> origin/gh/anshul-si/3/head 2025-12-04T08:57:43.6324256Z * [new branch] gh/anshul-si/4/base -> origin/gh/anshul-si/4/base 2025-12-04T08:57:43.6325650Z * [new branch] gh/anshul-si/4/head -> origin/gh/anshul-si/4/head 2025-12-04T08:57:43.6327116Z * [new branch] gh/anshul-si/5/base -> origin/gh/anshul-si/5/base 2025-12-04T08:57:43.6328178Z * [new branch] gh/anshul-si/5/head -> origin/gh/anshul-si/5/head 2025-12-04T08:57:43.6329901Z * [new branch] gh/anshul-si/53/base -> origin/gh/anshul-si/53/base 2025-12-04T08:57:43.6330996Z * [new branch] gh/anshul-si/53/head -> origin/gh/anshul-si/53/head 2025-12-04T08:57:43.6332595Z * [new branch] gh/anshul-si/58/base -> origin/gh/anshul-si/58/base 2025-12-04T08:57:43.6333822Z * [new branch] gh/anshul-si/58/head -> origin/gh/anshul-si/58/head 2025-12-04T08:57:43.6335284Z * [new branch] gh/anshul-si/66/base -> origin/gh/anshul-si/66/base 2025-12-04T08:57:43.6336393Z * [new branch] gh/anshul-si/66/head -> origin/gh/anshul-si/66/head 2025-12-04T08:57:43.6337802Z * [new branch] gh/anshul-si/66/orig -> origin/gh/anshul-si/66/orig 2025-12-04T08:57:43.6339238Z * [new branch] gh/anshul-si/67/base -> origin/gh/anshul-si/67/base 2025-12-04T08:57:43.6340364Z * [new branch] gh/anshul-si/67/head -> origin/gh/anshul-si/67/head 2025-12-04T08:57:43.6341471Z * [new branch] gh/anshul-si/67/orig -> origin/gh/anshul-si/67/orig 2025-12-04T08:57:43.6343232Z * [new branch] gh/anshul-si/68/base -> origin/gh/anshul-si/68/base 2025-12-04T08:57:43.6344252Z * [new branch] gh/anshul-si/68/head -> origin/gh/anshul-si/68/head 2025-12-04T08:57:43.6345324Z * [new branch] gh/anshul-si/68/orig -> origin/gh/anshul-si/68/orig 2025-12-04T08:57:43.6347164Z * [new branch] gh/anshul-si/69/base -> origin/gh/anshul-si/69/base 2025-12-04T08:57:43.6348203Z * [new branch] gh/anshul-si/69/head -> origin/gh/anshul-si/69/head 2025-12-04T08:57:43.6349424Z * [new branch] gh/anshul-si/69/orig -> origin/gh/anshul-si/69/orig 2025-12-04T08:57:43.6351260Z * [new branch] gh/anshul-si/70/base -> origin/gh/anshul-si/70/base 2025-12-04T08:57:43.6352360Z * [new branch] gh/anshul-si/70/head -> origin/gh/anshul-si/70/head 2025-12-04T08:57:43.6353480Z * [new branch] gh/anshul-si/70/orig -> origin/gh/anshul-si/70/orig 2025-12-04T08:57:43.6355231Z * [new branch] gh/anshul-si/71/base -> origin/gh/anshul-si/71/base 2025-12-04T08:57:43.6356178Z * [new branch] gh/anshul-si/71/head -> origin/gh/anshul-si/71/head 2025-12-04T08:57:43.6357256Z * [new branch] gh/anshul-si/71/orig -> origin/gh/anshul-si/71/orig 2025-12-04T08:57:43.6358856Z * [new branch] gh/anshul-si/72/base -> origin/gh/anshul-si/72/base 2025-12-04T08:57:43.6359951Z * [new branch] gh/anshul-si/72/head -> origin/gh/anshul-si/72/head 2025-12-04T08:57:43.6361076Z * [new branch] gh/anshul-si/72/orig -> origin/gh/anshul-si/72/orig 2025-12-04T08:57:43.6362587Z * [new branch] gh/anshul-si/73/base -> origin/gh/anshul-si/73/base 2025-12-04T08:57:43.6363696Z * [new branch] gh/anshul-si/73/head -> origin/gh/anshul-si/73/head 2025-12-04T08:57:43.6364826Z * [new branch] gh/anshul-si/73/orig -> origin/gh/anshul-si/73/orig 2025-12-04T08:57:43.6366813Z * [new branch] gh/aorenste/132/base -> origin/gh/aorenste/132/base 2025-12-04T08:57:43.6367854Z * [new branch] gh/aorenste/132/head -> origin/gh/aorenste/132/head 2025-12-04T08:57:43.6369584Z * [new branch] gh/aorenste/134/base -> origin/gh/aorenste/134/base 2025-12-04T08:57:43.6370766Z * [new branch] gh/aorenste/134/head -> origin/gh/aorenste/134/head 2025-12-04T08:57:43.6371880Z * [new branch] gh/aorenste/134/orig -> origin/gh/aorenste/134/orig 2025-12-04T08:57:43.6373495Z * [new branch] gh/aorenste/139/base -> origin/gh/aorenste/139/base 2025-12-04T08:57:43.6374522Z * [new branch] gh/aorenste/139/head -> origin/gh/aorenste/139/head 2025-12-04T08:57:43.6375664Z * [new branch] gh/aorenste/139/orig -> origin/gh/aorenste/139/orig 2025-12-04T08:57:43.6377637Z * [new branch] gh/aorenste/141/base -> origin/gh/aorenste/141/base 2025-12-04T08:57:43.6378590Z * [new branch] gh/aorenste/141/head -> origin/gh/aorenste/141/head 2025-12-04T08:57:43.6380542Z * [new branch] gh/aorenste/145/base -> origin/gh/aorenste/145/base 2025-12-04T08:57:43.6381695Z * [new branch] gh/aorenste/145/head -> origin/gh/aorenste/145/head 2025-12-04T08:57:43.6383106Z * [new branch] gh/aorenste/145/orig -> origin/gh/aorenste/145/orig 2025-12-04T08:57:43.6384701Z * [new branch] gh/aorenste/146/base -> origin/gh/aorenste/146/base 2025-12-04T08:57:43.6385925Z * [new branch] gh/aorenste/146/head -> origin/gh/aorenste/146/head 2025-12-04T08:57:43.6387081Z * [new branch] gh/aorenste/146/orig -> origin/gh/aorenste/146/orig 2025-12-04T08:57:43.6388882Z * [new branch] gh/aorenste/147/base -> origin/gh/aorenste/147/base 2025-12-04T08:57:43.6390083Z * [new branch] gh/aorenste/147/head -> origin/gh/aorenste/147/head 2025-12-04T08:57:43.6391187Z * [new branch] gh/aorenste/147/orig -> origin/gh/aorenste/147/orig 2025-12-04T08:57:43.6392726Z * [new branch] gh/aorenste/148/base -> origin/gh/aorenste/148/base 2025-12-04T08:57:43.6393836Z * [new branch] gh/aorenste/148/head -> origin/gh/aorenste/148/head 2025-12-04T08:57:43.6395004Z * [new branch] gh/aorenste/148/orig -> origin/gh/aorenste/148/orig 2025-12-04T08:57:43.6396545Z * [new branch] gh/aorenste/149/base -> origin/gh/aorenste/149/base 2025-12-04T08:57:43.6397681Z * [new branch] gh/aorenste/149/head -> origin/gh/aorenste/149/head 2025-12-04T08:57:43.6398793Z * [new branch] gh/aorenste/149/orig -> origin/gh/aorenste/149/orig 2025-12-04T08:57:43.6400321Z * [new branch] gh/aorenste/150/base -> origin/gh/aorenste/150/base 2025-12-04T08:57:43.6401428Z * [new branch] gh/aorenste/150/head -> origin/gh/aorenste/150/head 2025-12-04T08:57:43.6402493Z * [new branch] gh/aorenste/150/orig -> origin/gh/aorenste/150/orig 2025-12-04T08:57:43.6403910Z * [new branch] gh/aorenste/151/base -> origin/gh/aorenste/151/base 2025-12-04T08:57:43.6405016Z * [new branch] gh/aorenste/151/head -> origin/gh/aorenste/151/head 2025-12-04T08:57:43.6406206Z * [new branch] gh/aorenste/151/orig -> origin/gh/aorenste/151/orig 2025-12-04T08:57:43.6407766Z * [new branch] gh/aorenste/152/base -> origin/gh/aorenste/152/base 2025-12-04T08:57:43.6408801Z * [new branch] gh/aorenste/152/head -> origin/gh/aorenste/152/head 2025-12-04T08:57:43.6409881Z * [new branch] gh/aorenste/152/orig -> origin/gh/aorenste/152/orig 2025-12-04T08:57:43.6411250Z * [new branch] gh/aorenste/153/base -> origin/gh/aorenste/153/base 2025-12-04T08:57:43.6412372Z * [new branch] gh/aorenste/153/head -> origin/gh/aorenste/153/head 2025-12-04T08:57:43.6413441Z * [new branch] gh/aorenste/153/orig -> origin/gh/aorenste/153/orig 2025-12-04T08:57:43.6414836Z * [new branch] gh/aorenste/154/base -> origin/gh/aorenste/154/base 2025-12-04T08:57:43.6415900Z * [new branch] gh/aorenste/154/head -> origin/gh/aorenste/154/head 2025-12-04T08:57:43.6417718Z * [new branch] gh/aorenste/154/orig -> origin/gh/aorenste/154/orig 2025-12-04T08:57:43.6418770Z * [new branch] gh/aorenste/155/base -> origin/gh/aorenste/155/base 2025-12-04T08:57:43.6419880Z * [new branch] gh/aorenste/155/head -> origin/gh/aorenste/155/head 2025-12-04T08:57:43.6421099Z * [new branch] gh/aorenste/155/orig -> origin/gh/aorenste/155/orig 2025-12-04T08:57:43.6422608Z * [new branch] gh/aorenste/156/base -> origin/gh/aorenste/156/base 2025-12-04T08:57:43.6423691Z * [new branch] gh/aorenste/156/head -> origin/gh/aorenste/156/head 2025-12-04T08:57:43.6424724Z * [new branch] gh/aorenste/156/orig -> origin/gh/aorenste/156/orig 2025-12-04T08:57:43.6426581Z * [new branch] gh/aorenste/157/base -> origin/gh/aorenste/157/base 2025-12-04T08:57:43.6427844Z * [new branch] gh/aorenste/157/head -> origin/gh/aorenste/157/head 2025-12-04T08:57:43.6428906Z * [new branch] gh/aorenste/157/orig -> origin/gh/aorenste/157/orig 2025-12-04T08:57:43.6430402Z * [new branch] gh/aorenste/158/base -> origin/gh/aorenste/158/base 2025-12-04T08:57:43.6431556Z * [new branch] gh/aorenste/158/head -> origin/gh/aorenste/158/head 2025-12-04T08:57:43.6432629Z * [new branch] gh/aorenste/158/orig -> origin/gh/aorenste/158/orig 2025-12-04T08:57:43.6434143Z * [new branch] gh/aorenste/159/base -> origin/gh/aorenste/159/base 2025-12-04T08:57:43.6435219Z * [new branch] gh/aorenste/159/head -> origin/gh/aorenste/159/head 2025-12-04T08:57:43.6436223Z * [new branch] gh/aorenste/159/orig -> origin/gh/aorenste/159/orig 2025-12-04T08:57:43.6438066Z * [new branch] gh/avikchaudhuri/1/base -> origin/gh/avikchaudhuri/1/base 2025-12-04T08:57:43.6439151Z * [new branch] gh/avikchaudhuri/1/head -> origin/gh/avikchaudhuri/1/head 2025-12-04T08:57:43.6440526Z * [new branch] gh/avikchaudhuri/2/base -> origin/gh/avikchaudhuri/2/base 2025-12-04T08:57:43.6441592Z * [new branch] gh/avikchaudhuri/2/head -> origin/gh/avikchaudhuri/2/head 2025-12-04T08:57:43.6442799Z * [new branch] gh/avikchaudhuri/2/orig -> origin/gh/avikchaudhuri/2/orig 2025-12-04T08:57:43.6444855Z * [new branch] gh/bdhirsh/666/base -> origin/gh/bdhirsh/666/base 2025-12-04T08:57:43.6446052Z * [new branch] gh/bdhirsh/666/head -> origin/gh/bdhirsh/666/head 2025-12-04T08:57:43.6447072Z * [new branch] gh/bdhirsh/666/orig -> origin/gh/bdhirsh/666/orig 2025-12-04T08:57:43.6468447Z * [new branch] gh/bdhirsh/668/base -> origin/gh/bdhirsh/668/base 2025-12-04T08:57:43.6469360Z * [new branch] gh/bdhirsh/668/head -> origin/gh/bdhirsh/668/head 2025-12-04T08:57:43.6469962Z * [new branch] gh/bdhirsh/668/orig -> origin/gh/bdhirsh/668/orig 2025-12-04T08:57:43.6470576Z * [new branch] gh/bdhirsh/669/base -> origin/gh/bdhirsh/669/base 2025-12-04T08:57:43.6471190Z * [new branch] gh/bdhirsh/669/head -> origin/gh/bdhirsh/669/head 2025-12-04T08:57:43.6471797Z * [new branch] gh/bdhirsh/669/orig -> origin/gh/bdhirsh/669/orig 2025-12-04T08:57:43.6472394Z * [new branch] gh/bdhirsh/670/base -> origin/gh/bdhirsh/670/base 2025-12-04T08:57:43.6473017Z * [new branch] gh/bdhirsh/670/head -> origin/gh/bdhirsh/670/head 2025-12-04T08:57:43.6473633Z * [new branch] gh/bdhirsh/670/orig -> origin/gh/bdhirsh/670/orig 2025-12-04T08:57:43.6474223Z * [new branch] gh/bdhirsh/672/base -> origin/gh/bdhirsh/672/base 2025-12-04T08:57:43.6474825Z * [new branch] gh/bdhirsh/672/head -> origin/gh/bdhirsh/672/head 2025-12-04T08:57:43.6475430Z * [new branch] gh/bdhirsh/672/orig -> origin/gh/bdhirsh/672/orig 2025-12-04T08:57:43.6476040Z * [new branch] gh/bdhirsh/675/base -> origin/gh/bdhirsh/675/base 2025-12-04T08:57:43.6476635Z * [new branch] gh/bdhirsh/675/head -> origin/gh/bdhirsh/675/head 2025-12-04T08:57:43.6477240Z * [new branch] gh/bdhirsh/675/orig -> origin/gh/bdhirsh/675/orig 2025-12-04T08:57:43.6477861Z * [new branch] gh/bdhirsh/676/base -> origin/gh/bdhirsh/676/base 2025-12-04T08:57:43.6478473Z * [new branch] gh/bdhirsh/676/head -> origin/gh/bdhirsh/676/head 2025-12-04T08:57:43.6479077Z * [new branch] gh/bdhirsh/676/orig -> origin/gh/bdhirsh/676/orig 2025-12-04T08:57:43.6479685Z * [new branch] gh/bdhirsh/677/base -> origin/gh/bdhirsh/677/base 2025-12-04T08:57:43.6479915Z * [new branch] gh/bdhirsh/677/head -> origin/gh/bdhirsh/677/head 2025-12-04T08:57:43.6480161Z * [new branch] gh/bdhirsh/677/orig -> origin/gh/bdhirsh/677/orig 2025-12-04T08:57:43.6480392Z * [new branch] gh/bdhirsh/678/base -> origin/gh/bdhirsh/678/base 2025-12-04T08:57:43.6480635Z * [new branch] gh/bdhirsh/678/head -> origin/gh/bdhirsh/678/head 2025-12-04T08:57:43.6480868Z * [new branch] gh/bdhirsh/678/orig -> origin/gh/bdhirsh/678/orig 2025-12-04T08:57:43.6481118Z * [new branch] gh/bdhirsh/679/base -> origin/gh/bdhirsh/679/base 2025-12-04T08:57:43.6482193Z * [new branch] gh/bdhirsh/679/head -> origin/gh/bdhirsh/679/head 2025-12-04T08:57:43.6483540Z * [new branch] gh/bdhirsh/679/orig -> origin/gh/bdhirsh/679/orig 2025-12-04T08:57:43.6485038Z * [new branch] gh/bdhirsh/680/base -> origin/gh/bdhirsh/680/base 2025-12-04T08:57:43.6486236Z * [new branch] gh/bdhirsh/680/head -> origin/gh/bdhirsh/680/head 2025-12-04T08:57:43.6487336Z * [new branch] gh/bdhirsh/680/orig -> origin/gh/bdhirsh/680/orig 2025-12-04T08:57:43.6488792Z * [new branch] gh/bdhirsh/681/base -> origin/gh/bdhirsh/681/base 2025-12-04T08:57:43.6489979Z * [new branch] gh/bdhirsh/681/head -> origin/gh/bdhirsh/681/head 2025-12-04T08:57:43.6491100Z * [new branch] gh/bdhirsh/681/orig -> origin/gh/bdhirsh/681/orig 2025-12-04T08:57:43.6493091Z * [new branch] gh/benjaminglass1/101/base -> origin/gh/benjaminglass1/101/base 2025-12-04T08:57:43.6494110Z * [new branch] gh/benjaminglass1/101/head -> origin/gh/benjaminglass1/101/head 2025-12-04T08:57:43.6495262Z * [new branch] gh/benjaminglass1/101/orig -> origin/gh/benjaminglass1/101/orig 2025-12-04T08:57:43.6496951Z * [new branch] gh/benjaminglass1/102/base -> origin/gh/benjaminglass1/102/base 2025-12-04T08:57:43.6498174Z * [new branch] gh/benjaminglass1/102/head -> origin/gh/benjaminglass1/102/head 2025-12-04T08:57:43.6499277Z * [new branch] gh/benjaminglass1/102/orig -> origin/gh/benjaminglass1/102/orig 2025-12-04T08:57:43.6500765Z * [new branch] gh/benjaminglass1/106/base -> origin/gh/benjaminglass1/106/base 2025-12-04T08:57:43.6501889Z * [new branch] gh/benjaminglass1/106/head -> origin/gh/benjaminglass1/106/head 2025-12-04T08:57:43.6503057Z * [new branch] gh/benjaminglass1/106/orig -> origin/gh/benjaminglass1/106/orig 2025-12-04T08:57:43.6504624Z * [new branch] gh/benjaminglass1/107/base -> origin/gh/benjaminglass1/107/base 2025-12-04T08:57:43.6505769Z * [new branch] gh/benjaminglass1/107/head -> origin/gh/benjaminglass1/107/head 2025-12-04T08:57:43.6506903Z * [new branch] gh/benjaminglass1/107/orig -> origin/gh/benjaminglass1/107/orig 2025-12-04T08:57:43.6508387Z * [new branch] gh/benjaminglass1/108/base -> origin/gh/benjaminglass1/108/base 2025-12-04T08:57:43.6509571Z * [new branch] gh/benjaminglass1/108/head -> origin/gh/benjaminglass1/108/head 2025-12-04T08:57:43.6510667Z * [new branch] gh/benjaminglass1/108/orig -> origin/gh/benjaminglass1/108/orig 2025-12-04T08:57:43.6512124Z * [new branch] gh/benjaminglass1/109/base -> origin/gh/benjaminglass1/109/base 2025-12-04T08:57:43.6513242Z * [new branch] gh/benjaminglass1/109/head -> origin/gh/benjaminglass1/109/head 2025-12-04T08:57:43.6514377Z * [new branch] gh/benjaminglass1/109/orig -> origin/gh/benjaminglass1/109/orig 2025-12-04T08:57:43.6515794Z * [new branch] gh/benjaminglass1/97/base -> origin/gh/benjaminglass1/97/base 2025-12-04T08:57:43.6516877Z * [new branch] gh/benjaminglass1/97/head -> origin/gh/benjaminglass1/97/head 2025-12-04T08:57:43.6518003Z * [new branch] gh/benjaminglass1/97/orig -> origin/gh/benjaminglass1/97/orig 2025-12-04T08:57:43.6519822Z * [new branch] gh/bobrenjc93/570/base -> origin/gh/bobrenjc93/570/base 2025-12-04T08:57:43.6521085Z * [new branch] gh/bobrenjc93/570/head -> origin/gh/bobrenjc93/570/head 2025-12-04T08:57:43.6524444Z * [new branch] gh/bobrenjc93/570/orig -> origin/gh/bobrenjc93/570/orig 2025-12-04T08:57:43.6525842Z * [new branch] gh/bobrenjc93/604/base -> origin/gh/bobrenjc93/604/base 2025-12-04T08:57:43.6526963Z * [new branch] gh/bobrenjc93/604/head -> origin/gh/bobrenjc93/604/head 2025-12-04T08:57:43.6528113Z * [new branch] gh/bobrenjc93/604/orig -> origin/gh/bobrenjc93/604/orig 2025-12-04T08:57:43.6529627Z * [new branch] gh/bobrenjc93/638/base -> origin/gh/bobrenjc93/638/base 2025-12-04T08:57:43.6530844Z * [new branch] gh/bobrenjc93/638/head -> origin/gh/bobrenjc93/638/head 2025-12-04T08:57:43.6531961Z * [new branch] gh/bobrenjc93/638/orig -> origin/gh/bobrenjc93/638/orig 2025-12-04T08:57:43.6533542Z * [new branch] gh/bobrenjc93/653/base -> origin/gh/bobrenjc93/653/base 2025-12-04T08:57:43.6534638Z * [new branch] gh/bobrenjc93/653/head -> origin/gh/bobrenjc93/653/head 2025-12-04T08:57:43.6535815Z * [new branch] gh/bobrenjc93/653/orig -> origin/gh/bobrenjc93/653/orig 2025-12-04T08:57:43.6537684Z * [new branch] gh/bobrenjc93/654/base -> origin/gh/bobrenjc93/654/base 2025-12-04T08:57:43.6538995Z * [new branch] gh/bobrenjc93/654/head -> origin/gh/bobrenjc93/654/head 2025-12-04T08:57:43.6539898Z * [new branch] gh/bobrenjc93/654/orig -> origin/gh/bobrenjc93/654/orig 2025-12-04T08:57:43.6541443Z * [new branch] gh/bobrenjc93/657/base -> origin/gh/bobrenjc93/657/base 2025-12-04T08:57:43.6542496Z * [new branch] gh/bobrenjc93/657/head -> origin/gh/bobrenjc93/657/head 2025-12-04T08:57:43.6543614Z * [new branch] gh/bobrenjc93/657/orig -> origin/gh/bobrenjc93/657/orig 2025-12-04T08:57:43.6545128Z * [new branch] gh/bobrenjc93/672/base -> origin/gh/bobrenjc93/672/base 2025-12-04T08:57:43.6546185Z * [new branch] gh/bobrenjc93/672/head -> origin/gh/bobrenjc93/672/head 2025-12-04T08:57:43.6547322Z * [new branch] gh/bobrenjc93/672/orig -> origin/gh/bobrenjc93/672/orig 2025-12-04T08:57:43.6548981Z * [new branch] gh/bobrenjc93/679/base -> origin/gh/bobrenjc93/679/base 2025-12-04T08:57:43.6550266Z * [new branch] gh/bobrenjc93/679/head -> origin/gh/bobrenjc93/679/head 2025-12-04T08:57:43.6551588Z * [new branch] gh/bobrenjc93/679/orig -> origin/gh/bobrenjc93/679/orig 2025-12-04T08:57:43.6553111Z * [new branch] gh/bobrenjc93/680/base -> origin/gh/bobrenjc93/680/base 2025-12-04T08:57:43.6554174Z * [new branch] gh/bobrenjc93/680/head -> origin/gh/bobrenjc93/680/head 2025-12-04T08:57:43.6555333Z * [new branch] gh/bobrenjc93/680/orig -> origin/gh/bobrenjc93/680/orig 2025-12-04T08:57:43.6556638Z * [new branch] gh/bobrenjc93/681/base -> origin/gh/bobrenjc93/681/base 2025-12-04T08:57:43.6557719Z * [new branch] gh/bobrenjc93/681/head -> origin/gh/bobrenjc93/681/head 2025-12-04T08:57:43.6558850Z * [new branch] gh/bobrenjc93/681/orig -> origin/gh/bobrenjc93/681/orig 2025-12-04T08:57:43.6560172Z * [new branch] gh/bobrenjc93/682/base -> origin/gh/bobrenjc93/682/base 2025-12-04T08:57:43.6561290Z * [new branch] gh/bobrenjc93/682/head -> origin/gh/bobrenjc93/682/head 2025-12-04T08:57:43.6562409Z * [new branch] gh/bobrenjc93/682/orig -> origin/gh/bobrenjc93/682/orig 2025-12-04T08:57:43.6563860Z * [new branch] gh/bobrenjc93/683/base -> origin/gh/bobrenjc93/683/base 2025-12-04T08:57:43.6564947Z * [new branch] gh/bobrenjc93/683/head -> origin/gh/bobrenjc93/683/head 2025-12-04T08:57:43.6566158Z * [new branch] gh/bobrenjc93/683/orig -> origin/gh/bobrenjc93/683/orig 2025-12-04T08:57:43.6567628Z * [new branch] gh/bobrenjc93/684/base -> origin/gh/bobrenjc93/684/base 2025-12-04T08:57:43.6568890Z * [new branch] gh/bobrenjc93/684/head -> origin/gh/bobrenjc93/684/head 2025-12-04T08:57:43.6570185Z * [new branch] gh/bobrenjc93/684/orig -> origin/gh/bobrenjc93/684/orig 2025-12-04T08:57:43.6571513Z * [new branch] gh/bobrenjc93/685/base -> origin/gh/bobrenjc93/685/base 2025-12-04T08:57:43.6572895Z * [new branch] gh/bobrenjc93/685/head -> origin/gh/bobrenjc93/685/head 2025-12-04T08:57:43.6574266Z * [new branch] gh/bobrenjc93/685/orig -> origin/gh/bobrenjc93/685/orig 2025-12-04T08:57:43.6575921Z * [new branch] gh/bobrenjc93/686/base -> origin/gh/bobrenjc93/686/base 2025-12-04T08:57:43.6580450Z * [new branch] gh/bobrenjc93/686/head -> origin/gh/bobrenjc93/686/head 2025-12-04T08:57:43.6580714Z * [new branch] gh/bobrenjc93/686/orig -> origin/gh/bobrenjc93/686/orig 2025-12-04T08:57:43.6580982Z * [new branch] gh/bobrenjc93/687/base -> origin/gh/bobrenjc93/687/base 2025-12-04T08:57:43.6581574Z * [new branch] gh/bobrenjc93/687/head -> origin/gh/bobrenjc93/687/head 2025-12-04T08:57:43.6582701Z * [new branch] gh/bobrenjc93/687/orig -> origin/gh/bobrenjc93/687/orig 2025-12-04T08:57:43.6584593Z * [new branch] gh/bobrenjc93/688/base -> origin/gh/bobrenjc93/688/base 2025-12-04T08:57:43.6585730Z * [new branch] gh/bobrenjc93/688/head -> origin/gh/bobrenjc93/688/head 2025-12-04T08:57:43.6586840Z * [new branch] gh/bobrenjc93/688/orig -> origin/gh/bobrenjc93/688/orig 2025-12-04T08:57:43.6588246Z * [new branch] gh/bobrenjc93/689/base -> origin/gh/bobrenjc93/689/base 2025-12-04T08:57:43.6589612Z * [new branch] gh/bobrenjc93/689/head -> origin/gh/bobrenjc93/689/head 2025-12-04T08:57:43.6590740Z * [new branch] gh/bobrenjc93/689/orig -> origin/gh/bobrenjc93/689/orig 2025-12-04T08:57:43.6592092Z * [new branch] gh/bobrenjc93/690/base -> origin/gh/bobrenjc93/690/base 2025-12-04T08:57:43.6593188Z * [new branch] gh/bobrenjc93/690/head -> origin/gh/bobrenjc93/690/head 2025-12-04T08:57:43.6594272Z * [new branch] gh/bobrenjc93/690/orig -> origin/gh/bobrenjc93/690/orig 2025-12-04T08:57:43.6596571Z * [new branch] gh/bobrenjc93/691/base -> origin/gh/bobrenjc93/691/base 2025-12-04T08:57:43.6597989Z * [new branch] gh/bobrenjc93/691/head -> origin/gh/bobrenjc93/691/head 2025-12-04T08:57:43.6599982Z * [new branch] gh/bobrenjc93/691/orig -> origin/gh/bobrenjc93/691/orig 2025-12-04T08:57:43.6601868Z * [new branch] gh/bobrenjc93/692/base -> origin/gh/bobrenjc93/692/base 2025-12-04T08:57:43.6602945Z * [new branch] gh/bobrenjc93/692/head -> origin/gh/bobrenjc93/692/head 2025-12-04T08:57:43.6604053Z * [new branch] gh/bobrenjc93/692/orig -> origin/gh/bobrenjc93/692/orig 2025-12-04T08:57:43.6605381Z * [new branch] gh/bobrenjc93/693/base -> origin/gh/bobrenjc93/693/base 2025-12-04T08:57:43.6606454Z * [new branch] gh/bobrenjc93/693/head -> origin/gh/bobrenjc93/693/head 2025-12-04T08:57:43.6607621Z * [new branch] gh/bobrenjc93/693/orig -> origin/gh/bobrenjc93/693/orig 2025-12-04T08:57:43.6609590Z * [new branch] gh/bobrenjc93/694/base -> origin/gh/bobrenjc93/694/base 2025-12-04T08:57:43.6610729Z * [new branch] gh/bobrenjc93/694/head -> origin/gh/bobrenjc93/694/head 2025-12-04T08:57:43.6611855Z * [new branch] gh/bobrenjc93/694/orig -> origin/gh/bobrenjc93/694/orig 2025-12-04T08:57:43.6613217Z * [new branch] gh/bobrenjc93/695/base -> origin/gh/bobrenjc93/695/base 2025-12-04T08:57:43.6614289Z * [new branch] gh/bobrenjc93/695/head -> origin/gh/bobrenjc93/695/head 2025-12-04T08:57:43.6615518Z * [new branch] gh/bobrenjc93/695/orig -> origin/gh/bobrenjc93/695/orig 2025-12-04T08:57:43.6617675Z * [new branch] gh/c00w/23/base -> origin/gh/c00w/23/base 2025-12-04T08:57:43.6618827Z * [new branch] gh/c00w/23/head -> origin/gh/c00w/23/head 2025-12-04T08:57:43.6620648Z * [new branch] gh/c00w/53/base -> origin/gh/c00w/53/base 2025-12-04T08:57:43.6622007Z * [new branch] gh/c00w/53/head -> origin/gh/c00w/53/head 2025-12-04T08:57:43.6623101Z * [new branch] gh/c00w/53/orig -> origin/gh/c00w/53/orig 2025-12-04T08:57:43.6624449Z * [new branch] gh/c00w/54/base -> origin/gh/c00w/54/base 2025-12-04T08:57:43.6625602Z * [new branch] gh/c00w/54/head -> origin/gh/c00w/54/head 2025-12-04T08:57:43.6626777Z * [new branch] gh/c00w/54/orig -> origin/gh/c00w/54/orig 2025-12-04T08:57:43.6628246Z * [new branch] gh/c00w/56/base -> origin/gh/c00w/56/base 2025-12-04T08:57:43.6629360Z * [new branch] gh/c00w/56/head -> origin/gh/c00w/56/head 2025-12-04T08:57:43.6630584Z * [new branch] gh/c00w/56/orig -> origin/gh/c00w/56/orig 2025-12-04T08:57:43.6632234Z * [new branch] gh/c00w/57/base -> origin/gh/c00w/57/base 2025-12-04T08:57:43.6633366Z * [new branch] gh/c00w/57/head -> origin/gh/c00w/57/head 2025-12-04T08:57:43.6634530Z * [new branch] gh/c00w/57/orig -> origin/gh/c00w/57/orig 2025-12-04T08:57:43.6635894Z * [new branch] gh/c00w/58/base -> origin/gh/c00w/58/base 2025-12-04T08:57:43.6636959Z * [new branch] gh/c00w/58/head -> origin/gh/c00w/58/head 2025-12-04T08:57:43.6638149Z * [new branch] gh/c00w/58/orig -> origin/gh/c00w/58/orig 2025-12-04T08:57:43.6639876Z * [new branch] gh/clee2000/1/base -> origin/gh/clee2000/1/base 2025-12-04T08:57:43.6641056Z * [new branch] gh/clee2000/1/head -> origin/gh/clee2000/1/head 2025-12-04T08:57:43.6642128Z * [new branch] gh/clee2000/1/orig -> origin/gh/clee2000/1/orig 2025-12-04T08:57:43.6644101Z * [new branch] gh/coconutruben/1/base -> origin/gh/coconutruben/1/base 2025-12-04T08:57:43.6645297Z * [new branch] gh/coconutruben/1/head -> origin/gh/coconutruben/1/head 2025-12-04T08:57:43.6647065Z * [new branch] gh/coconutruben/55/base -> origin/gh/coconutruben/55/base 2025-12-04T08:57:43.6648168Z * [new branch] gh/coconutruben/55/head -> origin/gh/coconutruben/55/head 2025-12-04T08:57:43.6649296Z * [new branch] gh/coconutruben/55/orig -> origin/gh/coconutruben/55/orig 2025-12-04T08:57:43.6650919Z * [new branch] gh/coconutruben/57/base -> origin/gh/coconutruben/57/base 2025-12-04T08:57:43.6652177Z * [new branch] gh/coconutruben/57/head -> origin/gh/coconutruben/57/head 2025-12-04T08:57:43.6653366Z * [new branch] gh/coconutruben/57/orig -> origin/gh/coconutruben/57/orig 2025-12-04T08:57:43.6654843Z * [new branch] gh/coconutruben/70/base -> origin/gh/coconutruben/70/base 2025-12-04T08:57:43.6655965Z * [new branch] gh/coconutruben/70/head -> origin/gh/coconutruben/70/head 2025-12-04T08:57:43.6657596Z * [new branch] gh/coconutruben/70/orig -> origin/gh/coconutruben/70/orig 2025-12-04T08:57:43.6658860Z * [new branch] gh/coconutruben/71/base -> origin/gh/coconutruben/71/base 2025-12-04T08:57:43.6660164Z * [new branch] gh/coconutruben/71/head -> origin/gh/coconutruben/71/head 2025-12-04T08:57:43.6661300Z * [new branch] gh/coconutruben/71/orig -> origin/gh/coconutruben/71/orig 2025-12-04T08:57:43.6662656Z * [new branch] gh/coconutruben/72/base -> origin/gh/coconutruben/72/base 2025-12-04T08:57:43.6663798Z * [new branch] gh/coconutruben/72/head -> origin/gh/coconutruben/72/head 2025-12-04T08:57:43.6664950Z * [new branch] gh/coconutruben/72/orig -> origin/gh/coconutruben/72/orig 2025-12-04T08:57:43.6666321Z * [new branch] gh/coconutruben/73/base -> origin/gh/coconutruben/73/base 2025-12-04T08:57:43.6667651Z * [new branch] gh/coconutruben/73/head -> origin/gh/coconutruben/73/head 2025-12-04T08:57:43.6668671Z * [new branch] gh/coconutruben/73/orig -> origin/gh/coconutruben/73/orig 2025-12-04T08:57:43.6670369Z * [new branch] gh/coconutruben/74/base -> origin/gh/coconutruben/74/base 2025-12-04T08:57:43.6671584Z * [new branch] gh/coconutruben/74/head -> origin/gh/coconutruben/74/head 2025-12-04T08:57:43.6672679Z * [new branch] gh/coconutruben/74/orig -> origin/gh/coconutruben/74/orig 2025-12-04T08:57:43.6674221Z * [new branch] gh/coconutruben/79/base -> origin/gh/coconutruben/79/base 2025-12-04T08:57:43.6675622Z * [new branch] gh/coconutruben/79/head -> origin/gh/coconutruben/79/head 2025-12-04T08:57:43.6676826Z * [new branch] gh/coconutruben/79/orig -> origin/gh/coconutruben/79/orig 2025-12-04T08:57:43.6678221Z * [new branch] gh/coconutruben/80/base -> origin/gh/coconutruben/80/base 2025-12-04T08:57:43.6679440Z * [new branch] gh/coconutruben/80/head -> origin/gh/coconutruben/80/head 2025-12-04T08:57:43.6680574Z * [new branch] gh/coconutruben/80/orig -> origin/gh/coconutruben/80/orig 2025-12-04T08:57:43.6682207Z * [new branch] gh/coconutruben/82/base -> origin/gh/coconutruben/82/base 2025-12-04T08:57:43.6683251Z * [new branch] gh/coconutruben/82/head -> origin/gh/coconutruben/82/head 2025-12-04T08:57:43.6684323Z * [new branch] gh/coconutruben/82/orig -> origin/gh/coconutruben/82/orig 2025-12-04T08:57:43.6685957Z * [new branch] gh/coconutruben/83/base -> origin/gh/coconutruben/83/base 2025-12-04T08:57:43.6687148Z * [new branch] gh/coconutruben/83/head -> origin/gh/coconutruben/83/head 2025-12-04T08:57:43.6688211Z * [new branch] gh/coconutruben/83/orig -> origin/gh/coconutruben/83/orig 2025-12-04T08:57:43.6690117Z * [new branch] gh/coconutruben/84/base -> origin/gh/coconutruben/84/base 2025-12-04T08:57:43.6691116Z * [new branch] gh/coconutruben/84/head -> origin/gh/coconutruben/84/head 2025-12-04T08:57:43.6692252Z * [new branch] gh/coconutruben/84/orig -> origin/gh/coconutruben/84/orig 2025-12-04T08:57:43.6693717Z * [new branch] gh/coconutruben/85/base -> origin/gh/coconutruben/85/base 2025-12-04T08:57:43.6694847Z * [new branch] gh/coconutruben/85/head -> origin/gh/coconutruben/85/head 2025-12-04T08:57:43.6696014Z * [new branch] gh/coconutruben/85/orig -> origin/gh/coconutruben/85/orig 2025-12-04T08:57:43.6697892Z * [new branch] gh/coconutruben/86/base -> origin/gh/coconutruben/86/base 2025-12-04T08:57:43.6699063Z * [new branch] gh/coconutruben/86/head -> origin/gh/coconutruben/86/head 2025-12-04T08:57:43.6700214Z * [new branch] gh/coconutruben/86/orig -> origin/gh/coconutruben/86/orig 2025-12-04T08:57:43.6702054Z * [new branch] gh/colinchan15/1/base -> origin/gh/colinchan15/1/base 2025-12-04T08:57:43.6703221Z * [new branch] gh/colinchan15/1/head -> origin/gh/colinchan15/1/head 2025-12-04T08:57:43.6704589Z * [new branch] gh/colinchan15/2/base -> origin/gh/colinchan15/2/base 2025-12-04T08:57:43.6705811Z * [new branch] gh/colinchan15/2/head -> origin/gh/colinchan15/2/head 2025-12-04T08:57:43.6707157Z * [new branch] gh/colinchan15/3/base -> origin/gh/colinchan15/3/base 2025-12-04T08:57:43.6708198Z * [new branch] gh/colinchan15/3/head -> origin/gh/colinchan15/3/head 2025-12-04T08:57:43.6709606Z * [new branch] gh/colinchan15/6/base -> origin/gh/colinchan15/6/base 2025-12-04T08:57:43.6710679Z * [new branch] gh/colinchan15/6/head -> origin/gh/colinchan15/6/head 2025-12-04T08:57:43.6712415Z * [new branch] gh/d4l3k/1/base -> origin/gh/d4l3k/1/base 2025-12-04T08:57:43.6713516Z * [new branch] gh/d4l3k/1/head -> origin/gh/d4l3k/1/head 2025-12-04T08:57:43.6714972Z * [new branch] gh/d4l3k/2/base -> origin/gh/d4l3k/2/base 2025-12-04T08:57:43.6716060Z * [new branch] gh/d4l3k/2/head -> origin/gh/d4l3k/2/head 2025-12-04T08:57:43.6717146Z * [new branch] gh/d4l3k/2/orig -> origin/gh/d4l3k/2/orig 2025-12-04T08:57:43.6718697Z * [new branch] gh/d4l3k/3/base -> origin/gh/d4l3k/3/base 2025-12-04T08:57:43.6719758Z * [new branch] gh/d4l3k/3/head -> origin/gh/d4l3k/3/head 2025-12-04T08:57:43.6721038Z * [new branch] gh/d4l3k/3/orig -> origin/gh/d4l3k/3/orig 2025-12-04T08:57:43.6723056Z * [new branch] gh/d4l3k/4/base -> origin/gh/d4l3k/4/base 2025-12-04T08:57:43.6724093Z * [new branch] gh/d4l3k/4/head -> origin/gh/d4l3k/4/head 2025-12-04T08:57:43.6725239Z * [new branch] gh/d4l3k/4/orig -> origin/gh/d4l3k/4/orig 2025-12-04T08:57:43.6726669Z * [new branch] gh/d4l3k/5/base -> origin/gh/d4l3k/5/base 2025-12-04T08:57:43.6727841Z * [new branch] gh/d4l3k/5/orig -> origin/gh/d4l3k/5/orig 2025-12-04T08:57:43.6729684Z * [new branch] gh/davidberard98/392/base -> origin/gh/davidberard98/392/base 2025-12-04T08:57:43.6730820Z * [new branch] gh/davidberard98/392/head -> origin/gh/davidberard98/392/head 2025-12-04T08:57:43.6731950Z * [new branch] gh/davidberard98/392/orig -> origin/gh/davidberard98/392/orig 2025-12-04T08:57:43.6733766Z * [new branch] gh/davidberard98/399/base -> origin/gh/davidberard98/399/base 2025-12-04T08:57:43.6734910Z * [new branch] gh/davidberard98/399/head -> origin/gh/davidberard98/399/head 2025-12-04T08:57:43.6736041Z * [new branch] gh/davidberard98/399/orig -> origin/gh/davidberard98/399/orig 2025-12-04T08:57:43.6738114Z * [new branch] gh/desertfire/605/base -> origin/gh/desertfire/605/base 2025-12-04T08:57:43.6739220Z * [new branch] gh/desertfire/605/head -> origin/gh/desertfire/605/head 2025-12-04T08:57:43.6740423Z * [new branch] gh/desertfire/605/orig -> origin/gh/desertfire/605/orig 2025-12-04T08:57:43.6741938Z * [new branch] gh/desertfire/606/base -> origin/gh/desertfire/606/base 2025-12-04T08:57:43.6743019Z * [new branch] gh/desertfire/606/head -> origin/gh/desertfire/606/head 2025-12-04T08:57:43.6744284Z * [new branch] gh/desertfire/606/orig -> origin/gh/desertfire/606/orig 2025-12-04T08:57:43.6745765Z * [new branch] gh/desertfire/607/base -> origin/gh/desertfire/607/base 2025-12-04T08:57:43.6746870Z * [new branch] gh/desertfire/607/head -> origin/gh/desertfire/607/head 2025-12-04T08:57:43.6748043Z * [new branch] gh/desertfire/607/orig -> origin/gh/desertfire/607/orig 2025-12-04T08:57:43.6749751Z * [new branch] gh/desertfire/608/base -> origin/gh/desertfire/608/base 2025-12-04T08:57:43.6750815Z * [new branch] gh/desertfire/608/head -> origin/gh/desertfire/608/head 2025-12-04T08:57:43.6751948Z * [new branch] gh/desertfire/608/orig -> origin/gh/desertfire/608/orig 2025-12-04T08:57:43.6753388Z * [new branch] gh/desertfire/609/base -> origin/gh/desertfire/609/base 2025-12-04T08:57:43.6754617Z * [new branch] gh/desertfire/609/head -> origin/gh/desertfire/609/head 2025-12-04T08:57:43.6755730Z * [new branch] gh/desertfire/609/orig -> origin/gh/desertfire/609/orig 2025-12-04T08:57:43.6757408Z * [new branch] gh/desertfire/610/base -> origin/gh/desertfire/610/base 2025-12-04T08:57:43.6758539Z * [new branch] gh/desertfire/610/head -> origin/gh/desertfire/610/head 2025-12-04T08:57:43.6759690Z * [new branch] gh/desertfire/610/orig -> origin/gh/desertfire/610/orig 2025-12-04T08:57:43.6761075Z * [new branch] gh/desertfire/611/base -> origin/gh/desertfire/611/base 2025-12-04T08:57:43.6762276Z * [new branch] gh/desertfire/611/head -> origin/gh/desertfire/611/head 2025-12-04T08:57:43.6763432Z * [new branch] gh/desertfire/611/orig -> origin/gh/desertfire/611/orig 2025-12-04T08:57:43.6765033Z * [new branch] gh/desertfire/612/base -> origin/gh/desertfire/612/base 2025-12-04T08:57:43.6766113Z * [new branch] gh/desertfire/612/head -> origin/gh/desertfire/612/head 2025-12-04T08:57:43.6767233Z * [new branch] gh/desertfire/612/orig -> origin/gh/desertfire/612/orig 2025-12-04T08:57:43.6768808Z * [new branch] gh/desertfire/613/base -> origin/gh/desertfire/613/base 2025-12-04T08:57:43.6769909Z * [new branch] gh/desertfire/613/head -> origin/gh/desertfire/613/head 2025-12-04T08:57:43.6770990Z * [new branch] gh/desertfire/613/orig -> origin/gh/desertfire/613/orig 2025-12-04T08:57:43.6772570Z * [new branch] gh/desertfire/614/base -> origin/gh/desertfire/614/base 2025-12-04T08:57:43.6773790Z * [new branch] gh/desertfire/614/head -> origin/gh/desertfire/614/head 2025-12-04T08:57:43.6774909Z * [new branch] gh/desertfire/614/orig -> origin/gh/desertfire/614/orig 2025-12-04T08:57:43.6776487Z * [new branch] gh/desertfire/615/base -> origin/gh/desertfire/615/base 2025-12-04T08:57:43.6778206Z * [new branch] gh/desertfire/615/head -> origin/gh/desertfire/615/head 2025-12-04T08:57:43.6779319Z * [new branch] gh/desertfire/615/orig -> origin/gh/desertfire/615/orig 2025-12-04T08:57:43.6780722Z * [new branch] gh/desertfire/616/base -> origin/gh/desertfire/616/base 2025-12-04T08:57:43.6781965Z * [new branch] gh/desertfire/616/head -> origin/gh/desertfire/616/head 2025-12-04T08:57:43.6782997Z * [new branch] gh/desertfire/616/orig -> origin/gh/desertfire/616/orig 2025-12-04T08:57:43.6784356Z * [new branch] gh/desertfire/617/base -> origin/gh/desertfire/617/base 2025-12-04T08:57:43.6785623Z * [new branch] gh/desertfire/617/head -> origin/gh/desertfire/617/head 2025-12-04T08:57:43.6786669Z * [new branch] gh/desertfire/617/orig -> origin/gh/desertfire/617/orig 2025-12-04T08:57:43.6788444Z * [new branch] gh/dharakk/1/base -> origin/gh/dharakk/1/base 2025-12-04T08:57:43.6789727Z * [new branch] gh/dharakk/1/head -> origin/gh/dharakk/1/head 2025-12-04T08:57:43.6791481Z * [new branch] gh/drisspg/170/base -> origin/gh/drisspg/170/base 2025-12-04T08:57:43.6792576Z * [new branch] gh/drisspg/170/head -> origin/gh/drisspg/170/head 2025-12-04T08:57:43.6793761Z * [new branch] gh/drisspg/170/orig -> origin/gh/drisspg/170/orig 2025-12-04T08:57:43.6795199Z * [new branch] gh/drisspg/182/base -> origin/gh/drisspg/182/base 2025-12-04T08:57:43.6796330Z * [new branch] gh/drisspg/182/head -> origin/gh/drisspg/182/head 2025-12-04T08:57:43.6797615Z * [new branch] gh/drisspg/183/base -> origin/gh/drisspg/183/base 2025-12-04T08:57:43.6798621Z * [new branch] gh/drisspg/183/head -> origin/gh/drisspg/183/head 2025-12-04T08:57:43.6799908Z * [new branch] gh/drisspg/184/base -> origin/gh/drisspg/184/base 2025-12-04T08:57:43.6800910Z * [new branch] gh/drisspg/184/head -> origin/gh/drisspg/184/head 2025-12-04T08:57:43.6802402Z * [new branch] gh/drisspg/185/base -> origin/gh/drisspg/185/base 2025-12-04T08:57:43.6803582Z * [new branch] gh/drisspg/185/head -> origin/gh/drisspg/185/head 2025-12-04T08:57:43.6804942Z * [new branch] gh/drisspg/194/base -> origin/gh/drisspg/194/base 2025-12-04T08:57:43.6806045Z * [new branch] gh/drisspg/194/head -> origin/gh/drisspg/194/head 2025-12-04T08:57:43.6807227Z * [new branch] gh/drisspg/194/orig -> origin/gh/drisspg/194/orig 2025-12-04T08:57:43.6808658Z * [new branch] gh/drisspg/200/base -> origin/gh/drisspg/200/base 2025-12-04T08:57:43.6809749Z * [new branch] gh/drisspg/200/head -> origin/gh/drisspg/200/head 2025-12-04T08:57:43.6810823Z * [new branch] gh/drisspg/200/orig -> origin/gh/drisspg/200/orig 2025-12-04T08:57:43.6812275Z * [new branch] gh/drisspg/218/base -> origin/gh/drisspg/218/base 2025-12-04T08:57:43.6813454Z * [new branch] gh/drisspg/218/head -> origin/gh/drisspg/218/head 2025-12-04T08:57:43.6814462Z * [new branch] gh/drisspg/218/orig -> origin/gh/drisspg/218/orig 2025-12-04T08:57:43.6815874Z * [new branch] gh/drisspg/219/base -> origin/gh/drisspg/219/base 2025-12-04T08:57:43.6817324Z * [new branch] gh/drisspg/219/head -> origin/gh/drisspg/219/head 2025-12-04T08:57:43.6818438Z * [new branch] gh/drisspg/219/orig -> origin/gh/drisspg/219/orig 2025-12-04T08:57:43.6819973Z * [new branch] gh/drisspg/220/base -> origin/gh/drisspg/220/base 2025-12-04T08:57:43.6821312Z * [new branch] gh/drisspg/220/head -> origin/gh/drisspg/220/head 2025-12-04T08:57:43.6822600Z * [new branch] gh/drisspg/220/orig -> origin/gh/drisspg/220/orig 2025-12-04T08:57:43.6824079Z * [new branch] gh/drisspg/221/base -> origin/gh/drisspg/221/base 2025-12-04T08:57:43.6825186Z * [new branch] gh/drisspg/221/head -> origin/gh/drisspg/221/head 2025-12-04T08:57:43.6826297Z * [new branch] gh/drisspg/221/orig -> origin/gh/drisspg/221/orig 2025-12-04T08:57:43.6827767Z * [new branch] gh/drisspg/222/base -> origin/gh/drisspg/222/base 2025-12-04T08:57:43.6828870Z * [new branch] gh/drisspg/222/head -> origin/gh/drisspg/222/head 2025-12-04T08:57:43.6829999Z * [new branch] gh/drisspg/222/orig -> origin/gh/drisspg/222/orig 2025-12-04T08:57:43.6831503Z * [new branch] gh/drisspg/223/base -> origin/gh/drisspg/223/base 2025-12-04T08:57:43.6832622Z * [new branch] gh/drisspg/223/head -> origin/gh/drisspg/223/head 2025-12-04T08:57:43.6833807Z * [new branch] gh/drisspg/223/orig -> origin/gh/drisspg/223/orig 2025-12-04T08:57:43.6835294Z * [new branch] gh/drisspg/224/base -> origin/gh/drisspg/224/base 2025-12-04T08:57:43.6836358Z * [new branch] gh/drisspg/224/head -> origin/gh/drisspg/224/head 2025-12-04T08:57:43.6837522Z * [new branch] gh/drisspg/224/orig -> origin/gh/drisspg/224/orig 2025-12-04T08:57:43.6838976Z * [new branch] gh/drisspg/225/base -> origin/gh/drisspg/225/base 2025-12-04T08:57:43.6840044Z * [new branch] gh/drisspg/225/head -> origin/gh/drisspg/225/head 2025-12-04T08:57:43.6841198Z * [new branch] gh/drisspg/225/orig -> origin/gh/drisspg/225/orig 2025-12-04T08:57:43.6842609Z * [new branch] gh/drisspg/226/base -> origin/gh/drisspg/226/base 2025-12-04T08:57:43.6843662Z * [new branch] gh/drisspg/226/head -> origin/gh/drisspg/226/head 2025-12-04T08:57:43.6844753Z * [new branch] gh/drisspg/226/orig -> origin/gh/drisspg/226/orig 2025-12-04T08:57:43.6846631Z * [new branch] gh/drisspg/227/base -> origin/gh/drisspg/227/base 2025-12-04T08:57:43.6847723Z * [new branch] gh/drisspg/227/head -> origin/gh/drisspg/227/head 2025-12-04T08:57:43.6848792Z * [new branch] gh/drisspg/227/orig -> origin/gh/drisspg/227/orig 2025-12-04T08:57:43.6850265Z * [new branch] gh/drisspg/228/base -> origin/gh/drisspg/228/base 2025-12-04T08:57:43.6851369Z * [new branch] gh/drisspg/228/head -> origin/gh/drisspg/228/head 2025-12-04T08:57:43.6852521Z * [new branch] gh/drisspg/228/orig -> origin/gh/drisspg/228/orig 2025-12-04T08:57:43.6854007Z * [new branch] gh/drisspg/229/base -> origin/gh/drisspg/229/base 2025-12-04T08:57:43.6855056Z * [new branch] gh/drisspg/229/head -> origin/gh/drisspg/229/head 2025-12-04T08:57:43.6856154Z * [new branch] gh/drisspg/229/orig -> origin/gh/drisspg/229/orig 2025-12-04T08:57:43.6858063Z * [new branch] gh/drisspg/230/base -> origin/gh/drisspg/230/base 2025-12-04T08:57:43.6859013Z * [new branch] gh/drisspg/230/head -> origin/gh/drisspg/230/head 2025-12-04T08:57:43.6860282Z * [new branch] gh/drisspg/230/orig -> origin/gh/drisspg/230/orig 2025-12-04T08:57:43.6862076Z * [new branch] gh/dsjohns2/1/base -> origin/gh/dsjohns2/1/base 2025-12-04T08:57:43.6863244Z * [new branch] gh/dsjohns2/1/head -> origin/gh/dsjohns2/1/head 2025-12-04T08:57:43.6865079Z * [new branch] gh/dzmitry-huba/1/base -> origin/gh/dzmitry-huba/1/base 2025-12-04T08:57:43.6866341Z * [new branch] gh/dzmitry-huba/1/head -> origin/gh/dzmitry-huba/1/head 2025-12-04T08:57:43.6867894Z * [new branch] gh/dzmitry-huba/12/base -> origin/gh/dzmitry-huba/12/base 2025-12-04T08:57:43.6869341Z * [new branch] gh/dzmitry-huba/12/head -> origin/gh/dzmitry-huba/12/head 2025-12-04T08:57:43.6870497Z * [new branch] gh/dzmitry-huba/12/orig -> origin/gh/dzmitry-huba/12/orig 2025-12-04T08:57:43.6872107Z * [new branch] gh/dzmitry-huba/13/base -> origin/gh/dzmitry-huba/13/base 2025-12-04T08:57:43.6873284Z * [new branch] gh/dzmitry-huba/13/head -> origin/gh/dzmitry-huba/13/head 2025-12-04T08:57:43.6874367Z * [new branch] gh/dzmitry-huba/13/orig -> origin/gh/dzmitry-huba/13/orig 2025-12-04T08:57:43.6875838Z * [new branch] gh/dzmitry-huba/14/base -> origin/gh/dzmitry-huba/14/base 2025-12-04T08:57:43.6876926Z * [new branch] gh/dzmitry-huba/14/head -> origin/gh/dzmitry-huba/14/head 2025-12-04T08:57:43.6878031Z * [new branch] gh/dzmitry-huba/14/orig -> origin/gh/dzmitry-huba/14/orig 2025-12-04T08:57:43.6879618Z * [new branch] gh/dzmitry-huba/15/base -> origin/gh/dzmitry-huba/15/base 2025-12-04T08:57:43.6880689Z * [new branch] gh/dzmitry-huba/15/head -> origin/gh/dzmitry-huba/15/head 2025-12-04T08:57:43.6881875Z * [new branch] gh/dzmitry-huba/15/orig -> origin/gh/dzmitry-huba/15/orig 2025-12-04T08:57:43.6883495Z * [new branch] gh/dzmitry-huba/16/base -> origin/gh/dzmitry-huba/16/base 2025-12-04T08:57:43.6884854Z * [new branch] gh/dzmitry-huba/16/head -> origin/gh/dzmitry-huba/16/head 2025-12-04T08:57:43.6886014Z * [new branch] gh/dzmitry-huba/16/orig -> origin/gh/dzmitry-huba/16/orig 2025-12-04T08:57:43.6887506Z * [new branch] gh/dzmitry-huba/17/base -> origin/gh/dzmitry-huba/17/base 2025-12-04T08:57:43.6888629Z * [new branch] gh/dzmitry-huba/17/head -> origin/gh/dzmitry-huba/17/head 2025-12-04T08:57:43.6889716Z * [new branch] gh/dzmitry-huba/17/orig -> origin/gh/dzmitry-huba/17/orig 2025-12-04T08:57:43.6891034Z * [new branch] gh/dzmitry-huba/2/base -> origin/gh/dzmitry-huba/2/base 2025-12-04T08:57:43.6892062Z * [new branch] gh/dzmitry-huba/2/head -> origin/gh/dzmitry-huba/2/head 2025-12-04T08:57:43.6893370Z * [new branch] gh/dzmitry-huba/3/base -> origin/gh/dzmitry-huba/3/base 2025-12-04T08:57:43.6894370Z * [new branch] gh/dzmitry-huba/3/head -> origin/gh/dzmitry-huba/3/head 2025-12-04T08:57:43.6896429Z * [new branch] gh/eellison/808/base -> origin/gh/eellison/808/base 2025-12-04T08:57:43.6898001Z * [new branch] gh/eellison/808/head -> origin/gh/eellison/808/head 2025-12-04T08:57:43.6899096Z * [new branch] gh/eellison/808/orig -> origin/gh/eellison/808/orig 2025-12-04T08:57:43.6900930Z * [new branch] gh/eellison/822/base -> origin/gh/eellison/822/base 2025-12-04T08:57:43.6902099Z * [new branch] gh/eellison/822/head -> origin/gh/eellison/822/head 2025-12-04T08:57:43.6903294Z * [new branch] gh/eellison/822/orig -> origin/gh/eellison/822/orig 2025-12-04T08:57:43.6904741Z * [new branch] gh/eellison/823/base -> origin/gh/eellison/823/base 2025-12-04T08:57:43.6905876Z * [new branch] gh/eellison/823/head -> origin/gh/eellison/823/head 2025-12-04T08:57:43.6906994Z * [new branch] gh/eellison/823/orig -> origin/gh/eellison/823/orig 2025-12-04T08:57:43.6908458Z * [new branch] gh/eellison/862/base -> origin/gh/eellison/862/base 2025-12-04T08:57:43.6909693Z * [new branch] gh/eellison/862/head -> origin/gh/eellison/862/head 2025-12-04T08:57:43.6910748Z * [new branch] gh/eellison/862/orig -> origin/gh/eellison/862/orig 2025-12-04T08:57:43.6912297Z * [new branch] gh/eellison/863/base -> origin/gh/eellison/863/base 2025-12-04T08:57:43.6913362Z * [new branch] gh/eellison/863/head -> origin/gh/eellison/863/head 2025-12-04T08:57:43.6914445Z * [new branch] gh/eellison/863/orig -> origin/gh/eellison/863/orig 2025-12-04T08:57:43.6915796Z * [new branch] gh/eellison/864/base -> origin/gh/eellison/864/base 2025-12-04T08:57:43.6916914Z * [new branch] gh/eellison/864/head -> origin/gh/eellison/864/head 2025-12-04T08:57:43.6918105Z * [new branch] gh/eellison/864/orig -> origin/gh/eellison/864/orig 2025-12-04T08:57:43.6919533Z * [new branch] gh/eellison/865/base -> origin/gh/eellison/865/base 2025-12-04T08:57:43.6920594Z * [new branch] gh/eellison/865/head -> origin/gh/eellison/865/head 2025-12-04T08:57:43.6922307Z * [new branch] gh/eellison/865/orig -> origin/gh/eellison/865/orig 2025-12-04T08:57:43.6923857Z * [new branch] gh/eellison/866/base -> origin/gh/eellison/866/base 2025-12-04T08:57:43.6924864Z * [new branch] gh/eellison/866/head -> origin/gh/eellison/866/head 2025-12-04T08:57:43.6925986Z * [new branch] gh/eellison/866/orig -> origin/gh/eellison/866/orig 2025-12-04T08:57:43.6927685Z * [new branch] gh/eellison/867/base -> origin/gh/eellison/867/base 2025-12-04T08:57:43.6928986Z * [new branch] gh/eellison/867/head -> origin/gh/eellison/867/head 2025-12-04T08:57:43.6929980Z * [new branch] gh/eellison/867/orig -> origin/gh/eellison/867/orig 2025-12-04T08:57:43.6931735Z * [new branch] gh/eellison/868/base -> origin/gh/eellison/868/base 2025-12-04T08:57:43.6933199Z * [new branch] gh/eellison/868/head -> origin/gh/eellison/868/head 2025-12-04T08:57:43.6934438Z * [new branch] gh/eellison/868/orig -> origin/gh/eellison/868/orig 2025-12-04T08:57:43.6936409Z * [new branch] gh/eellison/869/base -> origin/gh/eellison/869/base 2025-12-04T08:57:43.6937826Z * [new branch] gh/eellison/869/head -> origin/gh/eellison/869/head 2025-12-04T08:57:43.6938942Z * [new branch] gh/eellison/869/orig -> origin/gh/eellison/869/orig 2025-12-04T08:57:43.6940488Z * [new branch] gh/eellison/870/base -> origin/gh/eellison/870/base 2025-12-04T08:57:43.6941577Z * [new branch] gh/eellison/870/head -> origin/gh/eellison/870/head 2025-12-04T08:57:43.6942697Z * [new branch] gh/eellison/870/orig -> origin/gh/eellison/870/orig 2025-12-04T08:57:43.6944395Z * [new branch] gh/eellison/871/base -> origin/gh/eellison/871/base 2025-12-04T08:57:43.6945451Z * [new branch] gh/eellison/871/head -> origin/gh/eellison/871/head 2025-12-04T08:57:43.6946628Z * [new branch] gh/eellison/871/orig -> origin/gh/eellison/871/orig 2025-12-04T08:57:43.6948144Z * [new branch] gh/eellison/872/base -> origin/gh/eellison/872/base 2025-12-04T08:57:43.6949501Z * [new branch] gh/eellison/872/head -> origin/gh/eellison/872/head 2025-12-04T08:57:43.6950513Z * [new branch] gh/eellison/872/orig -> origin/gh/eellison/872/orig 2025-12-04T08:57:43.6952287Z * [new branch] gh/eellison/873/base -> origin/gh/eellison/873/base 2025-12-04T08:57:43.6953342Z * [new branch] gh/eellison/873/head -> origin/gh/eellison/873/head 2025-12-04T08:57:43.6954435Z * [new branch] gh/eellison/873/orig -> origin/gh/eellison/873/orig 2025-12-04T08:57:43.6955886Z * [new branch] gh/eellison/874/base -> origin/gh/eellison/874/base 2025-12-04T08:57:43.6957031Z * [new branch] gh/eellison/874/head -> origin/gh/eellison/874/head 2025-12-04T08:57:43.6958122Z * [new branch] gh/eellison/874/orig -> origin/gh/eellison/874/orig 2025-12-04T08:57:43.6960147Z * [new branch] gh/eellison/875/base -> origin/gh/eellison/875/base 2025-12-04T08:57:43.6961407Z * [new branch] gh/eellison/875/head -> origin/gh/eellison/875/head 2025-12-04T08:57:43.6962488Z * [new branch] gh/eellison/875/orig -> origin/gh/eellison/875/orig 2025-12-04T08:57:43.6964083Z * [new branch] gh/eellison/876/base -> origin/gh/eellison/876/base 2025-12-04T08:57:43.6965177Z * [new branch] gh/eellison/876/head -> origin/gh/eellison/876/head 2025-12-04T08:57:43.6966344Z * [new branch] gh/eellison/876/orig -> origin/gh/eellison/876/orig 2025-12-04T08:57:43.6967841Z * [new branch] gh/eellison/877/base -> origin/gh/eellison/877/base 2025-12-04T08:57:43.6968941Z * [new branch] gh/eellison/877/head -> origin/gh/eellison/877/head 2025-12-04T08:57:43.6970006Z * [new branch] gh/eellison/877/orig -> origin/gh/eellison/877/orig 2025-12-04T08:57:43.6971931Z * [new branch] gh/eellison/878/base -> origin/gh/eellison/878/base 2025-12-04T08:57:43.6972573Z * [new branch] gh/eellison/878/head -> origin/gh/eellison/878/head 2025-12-04T08:57:43.6973691Z * [new branch] gh/eellison/878/orig -> origin/gh/eellison/878/orig 2025-12-04T08:57:43.6975297Z * [new branch] gh/eellison/879/base -> origin/gh/eellison/879/base 2025-12-04T08:57:43.6976496Z * [new branch] gh/eellison/879/head -> origin/gh/eellison/879/head 2025-12-04T08:57:43.6977922Z * [new branch] gh/eellison/879/orig -> origin/gh/eellison/879/orig 2025-12-04T08:57:43.6979326Z * [new branch] gh/eellison/880/base -> origin/gh/eellison/880/base 2025-12-04T08:57:43.6980492Z * [new branch] gh/eellison/880/head -> origin/gh/eellison/880/head 2025-12-04T08:57:43.6981660Z * [new branch] gh/eellison/880/orig -> origin/gh/eellison/880/orig 2025-12-04T08:57:43.6983206Z * [new branch] gh/eellison/881/base -> origin/gh/eellison/881/base 2025-12-04T08:57:43.6984339Z * [new branch] gh/eellison/881/head -> origin/gh/eellison/881/head 2025-12-04T08:57:43.6985512Z * [new branch] gh/eellison/881/orig -> origin/gh/eellison/881/orig 2025-12-04T08:57:43.6986990Z * [new branch] gh/eellison/882/base -> origin/gh/eellison/882/base 2025-12-04T08:57:43.6988097Z * [new branch] gh/eellison/882/head -> origin/gh/eellison/882/head 2025-12-04T08:57:43.6989481Z * [new branch] gh/eellison/882/orig -> origin/gh/eellison/882/orig 2025-12-04T08:57:43.6990964Z * [new branch] gh/eellison/883/base -> origin/gh/eellison/883/base 2025-12-04T08:57:43.6992048Z * [new branch] gh/eellison/883/head -> origin/gh/eellison/883/head 2025-12-04T08:57:43.6993151Z * [new branch] gh/eellison/883/orig -> origin/gh/eellison/883/orig 2025-12-04T08:57:43.6994540Z * [new branch] gh/eellison/884/base -> origin/gh/eellison/884/base 2025-12-04T08:57:43.6995631Z * [new branch] gh/eellison/884/head -> origin/gh/eellison/884/head 2025-12-04T08:57:43.6996646Z * [new branch] gh/eellison/884/orig -> origin/gh/eellison/884/orig 2025-12-04T08:57:43.6998389Z * [new branch] gh/etaf/147/base -> origin/gh/etaf/147/base 2025-12-04T08:57:43.6999509Z * [new branch] gh/etaf/147/head -> origin/gh/etaf/147/head 2025-12-04T08:57:43.7001155Z * [new branch] gh/etaf/154/base -> origin/gh/etaf/154/base 2025-12-04T08:57:43.7002320Z * [new branch] gh/etaf/154/head -> origin/gh/etaf/154/head 2025-12-04T08:57:43.7003423Z * [new branch] gh/etaf/154/orig -> origin/gh/etaf/154/orig 2025-12-04T08:57:43.7004933Z * [new branch] gh/etaf/156/base -> origin/gh/etaf/156/base 2025-12-04T08:57:43.7006043Z * [new branch] gh/etaf/156/head -> origin/gh/etaf/156/head 2025-12-04T08:57:43.7007180Z * [new branch] gh/etaf/156/orig -> origin/gh/etaf/156/orig 2025-12-04T08:57:43.7008799Z * [new branch] gh/etaf/157/base -> origin/gh/etaf/157/base 2025-12-04T08:57:43.7009976Z * [new branch] gh/etaf/157/head -> origin/gh/etaf/157/head 2025-12-04T08:57:43.7011136Z * [new branch] gh/etaf/157/orig -> origin/gh/etaf/157/orig 2025-12-04T08:57:43.7012509Z * [new branch] gh/etaf/158/base -> origin/gh/etaf/158/base 2025-12-04T08:57:43.7013730Z * [new branch] gh/etaf/158/head -> origin/gh/etaf/158/head 2025-12-04T08:57:43.7014849Z * [new branch] gh/etaf/158/orig -> origin/gh/etaf/158/orig 2025-12-04T08:57:43.7016446Z * [new branch] gh/etaf/159/base -> origin/gh/etaf/159/base 2025-12-04T08:57:43.7018035Z * [new branch] gh/etaf/159/head -> origin/gh/etaf/159/head 2025-12-04T08:57:43.7019159Z * [new branch] gh/etaf/159/orig -> origin/gh/etaf/159/orig 2025-12-04T08:57:43.7021067Z * [new branch] gh/etaf/160/base -> origin/gh/etaf/160/base 2025-12-04T08:57:43.7024313Z * [new branch] gh/etaf/160/head -> origin/gh/etaf/160/head 2025-12-04T08:57:43.7025532Z * [new branch] gh/etaf/160/orig -> origin/gh/etaf/160/orig 2025-12-04T08:57:43.7027132Z * [new branch] gh/etaf/161/base -> origin/gh/etaf/161/base 2025-12-04T08:57:43.7028330Z * [new branch] gh/etaf/161/head -> origin/gh/etaf/161/head 2025-12-04T08:57:43.7029462Z * [new branch] gh/etaf/161/orig -> origin/gh/etaf/161/orig 2025-12-04T08:57:43.7030985Z * [new branch] gh/etaf/166/base -> origin/gh/etaf/166/base 2025-12-04T08:57:43.7032285Z * [new branch] gh/etaf/166/head -> origin/gh/etaf/166/head 2025-12-04T08:57:43.7033497Z * [new branch] gh/etaf/166/orig -> origin/gh/etaf/166/orig 2025-12-04T08:57:43.7034898Z * [new branch] gh/etaf/167/base -> origin/gh/etaf/167/base 2025-12-04T08:57:43.7036040Z * [new branch] gh/etaf/167/head -> origin/gh/etaf/167/head 2025-12-04T08:57:43.7037097Z * [new branch] gh/etaf/167/orig -> origin/gh/etaf/167/orig 2025-12-04T08:57:43.7038796Z * [new branch] gh/etaf/168/base -> origin/gh/etaf/168/base 2025-12-04T08:57:43.7039945Z * [new branch] gh/etaf/168/head -> origin/gh/etaf/168/head 2025-12-04T08:57:43.7041039Z * [new branch] gh/etaf/168/orig -> origin/gh/etaf/168/orig 2025-12-04T08:57:43.7042547Z * [new branch] gh/etaf/172/base -> origin/gh/etaf/172/base 2025-12-04T08:57:43.7043698Z * [new branch] gh/etaf/172/head -> origin/gh/etaf/172/head 2025-12-04T08:57:43.7044992Z * [new branch] gh/etaf/172/orig -> origin/gh/etaf/172/orig 2025-12-04T08:57:43.7046581Z * [new branch] gh/etaf/173/base -> origin/gh/etaf/173/base 2025-12-04T08:57:43.7047875Z * [new branch] gh/etaf/173/head -> origin/gh/etaf/173/head 2025-12-04T08:57:43.7048933Z * [new branch] gh/etaf/173/orig -> origin/gh/etaf/173/orig 2025-12-04T08:57:43.7050473Z * [new branch] gh/etaf/174/base -> origin/gh/etaf/174/base 2025-12-04T08:57:43.7051546Z * [new branch] gh/etaf/174/head -> origin/gh/etaf/174/head 2025-12-04T08:57:43.7053120Z * [new branch] gh/etaf/175/base -> origin/gh/etaf/175/base 2025-12-04T08:57:43.7054194Z * [new branch] gh/etaf/175/head -> origin/gh/etaf/175/head 2025-12-04T08:57:43.7055214Z * [new branch] gh/etaf/175/orig -> origin/gh/etaf/175/orig 2025-12-04T08:57:43.7056955Z * [new branch] gh/etaf/176/base -> origin/gh/etaf/176/base 2025-12-04T08:57:43.7058252Z * [new branch] gh/etaf/176/head -> origin/gh/etaf/176/head 2025-12-04T08:57:43.7059411Z * [new branch] gh/etaf/176/orig -> origin/gh/etaf/176/orig 2025-12-04T08:57:43.7061806Z * [new branch] gh/etaf/177/base -> origin/gh/etaf/177/base 2025-12-04T08:57:43.7063097Z * [new branch] gh/etaf/177/head -> origin/gh/etaf/177/head 2025-12-04T08:57:43.7064285Z * [new branch] gh/etaf/177/orig -> origin/gh/etaf/177/orig 2025-12-04T08:57:43.7065948Z * [new branch] gh/etaf/178/base -> origin/gh/etaf/178/base 2025-12-04T08:57:43.7067249Z * [new branch] gh/etaf/178/head -> origin/gh/etaf/178/head 2025-12-04T08:57:43.7068432Z * [new branch] gh/etaf/178/orig -> origin/gh/etaf/178/orig 2025-12-04T08:57:43.7070151Z * [new branch] gh/etaf/179/base -> origin/gh/etaf/179/base 2025-12-04T08:57:43.7071235Z * [new branch] gh/etaf/179/head -> origin/gh/etaf/179/head 2025-12-04T08:57:43.7072336Z * [new branch] gh/etaf/179/orig -> origin/gh/etaf/179/orig 2025-12-04T08:57:43.7073688Z * [new branch] gh/etaf/180/base -> origin/gh/etaf/180/base 2025-12-04T08:57:43.7074756Z * [new branch] gh/etaf/180/head -> origin/gh/etaf/180/head 2025-12-04T08:57:43.7075872Z * [new branch] gh/etaf/180/orig -> origin/gh/etaf/180/orig 2025-12-04T08:57:43.7077673Z * [new branch] gh/exclamaforte/1/base -> origin/gh/exclamaforte/1/base 2025-12-04T08:57:43.7078751Z * [new branch] gh/exclamaforte/1/head -> origin/gh/exclamaforte/1/head 2025-12-04T08:57:43.7080144Z * [new branch] gh/exclamaforte/2/base -> origin/gh/exclamaforte/2/base 2025-12-04T08:57:43.7081172Z * [new branch] gh/exclamaforte/2/head -> origin/gh/exclamaforte/2/head 2025-12-04T08:57:43.7082629Z * [new branch] gh/exclamaforte/3/base -> origin/gh/exclamaforte/3/base 2025-12-04T08:57:43.7083818Z * [new branch] gh/exclamaforte/3/head -> origin/gh/exclamaforte/3/head 2025-12-04T08:57:43.7085417Z * [new branch] gh/exclamaforte/4/base -> origin/gh/exclamaforte/4/base 2025-12-04T08:57:43.7086396Z * [new branch] gh/exclamaforte/4/head -> origin/gh/exclamaforte/4/head 2025-12-04T08:57:43.7088356Z * [new branch] gh/ezyang/2374/base -> origin/gh/ezyang/2374/base 2025-12-04T08:57:43.7089467Z * [new branch] gh/ezyang/2374/head -> origin/gh/ezyang/2374/head 2025-12-04T08:57:43.7090566Z * [new branch] gh/ezyang/2374/orig -> origin/gh/ezyang/2374/orig 2025-12-04T08:57:43.7092091Z * [new branch] gh/ezyang/2973/base -> origin/gh/ezyang/2973/base 2025-12-04T08:57:43.7093098Z * [new branch] gh/ezyang/2973/head -> origin/gh/ezyang/2973/head 2025-12-04T08:57:43.7094251Z * [new branch] gh/ezyang/2973/orig -> origin/gh/ezyang/2973/orig 2025-12-04T08:57:43.7095673Z * [new branch] gh/ezyang/2974/base -> origin/gh/ezyang/2974/base 2025-12-04T08:57:43.7097011Z * [new branch] gh/ezyang/2974/head -> origin/gh/ezyang/2974/head 2025-12-04T08:57:43.7098293Z * [new branch] gh/ezyang/2974/orig -> origin/gh/ezyang/2974/orig 2025-12-04T08:57:43.7099783Z * [new branch] gh/ezyang/3131/base -> origin/gh/ezyang/3131/base 2025-12-04T08:57:43.7100985Z * [new branch] gh/ezyang/3131/head -> origin/gh/ezyang/3131/head 2025-12-04T08:57:43.7102092Z * [new branch] gh/ezyang/3131/orig -> origin/gh/ezyang/3131/orig 2025-12-04T08:57:43.7103561Z * [new branch] gh/ezyang/3139/base -> origin/gh/ezyang/3139/base 2025-12-04T08:57:43.7104676Z * [new branch] gh/ezyang/3139/head -> origin/gh/ezyang/3139/head 2025-12-04T08:57:43.7105807Z * [new branch] gh/ezyang/3139/orig -> origin/gh/ezyang/3139/orig 2025-12-04T08:57:43.7107235Z * [new branch] gh/ezyang/3140/base -> origin/gh/ezyang/3140/base 2025-12-04T08:57:43.7108372Z * [new branch] gh/ezyang/3140/head -> origin/gh/ezyang/3140/head 2025-12-04T08:57:43.7109588Z * [new branch] gh/ezyang/3140/orig -> origin/gh/ezyang/3140/orig 2025-12-04T08:57:43.7111005Z * [new branch] gh/ezyang/3143/base -> origin/gh/ezyang/3143/base 2025-12-04T08:57:43.7112074Z * [new branch] gh/ezyang/3143/head -> origin/gh/ezyang/3143/head 2025-12-04T08:57:43.7113259Z * [new branch] gh/ezyang/3143/orig -> origin/gh/ezyang/3143/orig 2025-12-04T08:57:43.7115189Z * [new branch] gh/ezyang/3144/base -> origin/gh/ezyang/3144/base 2025-12-04T08:57:43.7116267Z * [new branch] gh/ezyang/3144/head -> origin/gh/ezyang/3144/head 2025-12-04T08:57:43.7117432Z * [new branch] gh/ezyang/3144/orig -> origin/gh/ezyang/3144/orig 2025-12-04T08:57:43.7118891Z * [new branch] gh/ezyang/3167/base -> origin/gh/ezyang/3167/base 2025-12-04T08:57:43.7119958Z * [new branch] gh/ezyang/3167/head -> origin/gh/ezyang/3167/head 2025-12-04T08:57:43.7121360Z * [new branch] gh/ezyang/3167/orig -> origin/gh/ezyang/3167/orig 2025-12-04T08:57:43.7122928Z * [new branch] gh/ezyang/3173/base -> origin/gh/ezyang/3173/base 2025-12-04T08:57:43.7124041Z * [new branch] gh/ezyang/3173/head -> origin/gh/ezyang/3173/head 2025-12-04T08:57:43.7125236Z * [new branch] gh/ezyang/3173/orig -> origin/gh/ezyang/3173/orig 2025-12-04T08:57:43.7126743Z * [new branch] gh/ezyang/3175/base -> origin/gh/ezyang/3175/base 2025-12-04T08:57:43.7127873Z * [new branch] gh/ezyang/3175/head -> origin/gh/ezyang/3175/head 2025-12-04T08:57:43.7129087Z * [new branch] gh/ezyang/3175/orig -> origin/gh/ezyang/3175/orig 2025-12-04T08:57:43.7130601Z * [new branch] gh/ezyang/3182/base -> origin/gh/ezyang/3182/base 2025-12-04T08:57:43.7131692Z * [new branch] gh/ezyang/3182/head -> origin/gh/ezyang/3182/head 2025-12-04T08:57:43.7132818Z * [new branch] gh/ezyang/3182/orig -> origin/gh/ezyang/3182/orig 2025-12-04T08:57:43.7134407Z * [new branch] gh/ezyang/3185/base -> origin/gh/ezyang/3185/base 2025-12-04T08:57:43.7135495Z * [new branch] gh/ezyang/3185/head -> origin/gh/ezyang/3185/head 2025-12-04T08:57:43.7136617Z * [new branch] gh/ezyang/3185/orig -> origin/gh/ezyang/3185/orig 2025-12-04T08:57:43.7138436Z * [new branch] gh/ezyang/3189/base -> origin/gh/ezyang/3189/base 2025-12-04T08:57:43.7139513Z * [new branch] gh/ezyang/3189/head -> origin/gh/ezyang/3189/head 2025-12-04T08:57:43.7140576Z * [new branch] gh/ezyang/3189/orig -> origin/gh/ezyang/3189/orig 2025-12-04T08:57:43.7142074Z * [new branch] gh/ezyang/3191/base -> origin/gh/ezyang/3191/base 2025-12-04T08:57:43.7143175Z * [new branch] gh/ezyang/3191/head -> origin/gh/ezyang/3191/head 2025-12-04T08:57:43.7144394Z * [new branch] gh/ezyang/3191/orig -> origin/gh/ezyang/3191/orig 2025-12-04T08:57:43.7146313Z * [new branch] gh/ezyang/3192/base -> origin/gh/ezyang/3192/base 2025-12-04T08:57:43.7147457Z * [new branch] gh/ezyang/3192/head -> origin/gh/ezyang/3192/head 2025-12-04T08:57:43.7148711Z * [new branch] gh/ezyang/3192/orig -> origin/gh/ezyang/3192/orig 2025-12-04T08:57:43.7150324Z * [new branch] gh/ezyang/3193/base -> origin/gh/ezyang/3193/base 2025-12-04T08:57:43.7151429Z * [new branch] gh/ezyang/3193/head -> origin/gh/ezyang/3193/head 2025-12-04T08:57:43.7152568Z * [new branch] gh/ezyang/3193/orig -> origin/gh/ezyang/3193/orig 2025-12-04T08:57:43.7154173Z * [new branch] gh/ezyang/3194/base -> origin/gh/ezyang/3194/base 2025-12-04T08:57:43.7155259Z * [new branch] gh/ezyang/3194/head -> origin/gh/ezyang/3194/head 2025-12-04T08:57:43.7156340Z * [new branch] gh/ezyang/3194/orig -> origin/gh/ezyang/3194/orig 2025-12-04T08:57:43.7157801Z * [new branch] gh/ezyang/3195/base -> origin/gh/ezyang/3195/base 2025-12-04T08:57:43.7158833Z * [new branch] gh/ezyang/3195/head -> origin/gh/ezyang/3195/head 2025-12-04T08:57:43.7160024Z * [new branch] gh/ezyang/3195/orig -> origin/gh/ezyang/3195/orig 2025-12-04T08:57:43.7161468Z * [new branch] gh/ezyang/3196/base -> origin/gh/ezyang/3196/base 2025-12-04T08:57:43.7163030Z * [new branch] gh/ezyang/3196/head -> origin/gh/ezyang/3196/head 2025-12-04T08:57:43.7164156Z * [new branch] gh/ezyang/3196/orig -> origin/gh/ezyang/3196/orig 2025-12-04T08:57:43.7165646Z * [new branch] gh/ezyang/3197/base -> origin/gh/ezyang/3197/base 2025-12-04T08:57:43.7166750Z * [new branch] gh/ezyang/3197/head -> origin/gh/ezyang/3197/head 2025-12-04T08:57:43.7167905Z * [new branch] gh/ezyang/3197/orig -> origin/gh/ezyang/3197/orig 2025-12-04T08:57:43.7169826Z * [new branch] gh/ezyang/3198/base -> origin/gh/ezyang/3198/base 2025-12-04T08:57:43.7170911Z * [new branch] gh/ezyang/3198/head -> origin/gh/ezyang/3198/head 2025-12-04T08:57:43.7172058Z * [new branch] gh/ezyang/3198/orig -> origin/gh/ezyang/3198/orig 2025-12-04T08:57:43.7173520Z * [new branch] gh/ezyang/3199/base -> origin/gh/ezyang/3199/base 2025-12-04T08:57:43.7174585Z * [new branch] gh/ezyang/3199/head -> origin/gh/ezyang/3199/head 2025-12-04T08:57:43.7175891Z * [new branch] gh/ezyang/3199/orig -> origin/gh/ezyang/3199/orig 2025-12-04T08:57:43.7177684Z * [new branch] gh/ezyang/3200/base -> origin/gh/ezyang/3200/base 2025-12-04T08:57:43.7178862Z * [new branch] gh/ezyang/3200/head -> origin/gh/ezyang/3200/head 2025-12-04T08:57:43.7179989Z * [new branch] gh/ezyang/3200/orig -> origin/gh/ezyang/3200/orig 2025-12-04T08:57:43.7181506Z * [new branch] gh/ezyang/3201/base -> origin/gh/ezyang/3201/base 2025-12-04T08:57:43.7182609Z * [new branch] gh/ezyang/3201/head -> origin/gh/ezyang/3201/head 2025-12-04T08:57:43.7183782Z * [new branch] gh/ezyang/3201/orig -> origin/gh/ezyang/3201/orig 2025-12-04T08:57:43.7185268Z * [new branch] gh/ezyang/3202/base -> origin/gh/ezyang/3202/base 2025-12-04T08:57:43.7186307Z * [new branch] gh/ezyang/3202/head -> origin/gh/ezyang/3202/head 2025-12-04T08:57:43.7187422Z * [new branch] gh/ezyang/3202/orig -> origin/gh/ezyang/3202/orig 2025-12-04T08:57:43.7189050Z * [new branch] gh/ezyang/3203/base -> origin/gh/ezyang/3203/base 2025-12-04T08:57:43.7190124Z * [new branch] gh/ezyang/3203/head -> origin/gh/ezyang/3203/head 2025-12-04T08:57:43.7191448Z * [new branch] gh/ezyang/3203/orig -> origin/gh/ezyang/3203/orig 2025-12-04T08:57:43.7192912Z * [new branch] gh/ezyang/3204/base -> origin/gh/ezyang/3204/base 2025-12-04T08:57:43.7193999Z * [new branch] gh/ezyang/3204/head -> origin/gh/ezyang/3204/head 2025-12-04T08:57:43.7195088Z * [new branch] gh/ezyang/3204/orig -> origin/gh/ezyang/3204/orig 2025-12-04T08:57:43.7196562Z * [new branch] gh/ezyang/3205/base -> origin/gh/ezyang/3205/base 2025-12-04T08:57:43.7197668Z * [new branch] gh/ezyang/3205/head -> origin/gh/ezyang/3205/head 2025-12-04T08:57:43.7198718Z * [new branch] gh/ezyang/3205/orig -> origin/gh/ezyang/3205/orig 2025-12-04T08:57:43.7200210Z * [new branch] gh/ezyang/3206/base -> origin/gh/ezyang/3206/base 2025-12-04T08:57:43.7201275Z * [new branch] gh/ezyang/3206/head -> origin/gh/ezyang/3206/head 2025-12-04T08:57:43.7202382Z * [new branch] gh/ezyang/3206/orig -> origin/gh/ezyang/3206/orig 2025-12-04T08:57:43.7203833Z * [new branch] gh/ezyang/3207/base -> origin/gh/ezyang/3207/base 2025-12-04T08:57:43.7204944Z * [new branch] gh/ezyang/3207/head -> origin/gh/ezyang/3207/head 2025-12-04T08:57:43.7206170Z * [new branch] gh/ezyang/3207/orig -> origin/gh/ezyang/3207/orig 2025-12-04T08:57:43.7207598Z * [new branch] gh/ezyang/3208/base -> origin/gh/ezyang/3208/base 2025-12-04T08:57:43.7208770Z * [new branch] gh/ezyang/3208/head -> origin/gh/ezyang/3208/head 2025-12-04T08:57:43.7209869Z * [new branch] gh/ezyang/3208/orig -> origin/gh/ezyang/3208/orig 2025-12-04T08:57:43.7211363Z * [new branch] gh/ezyang/3209/base -> origin/gh/ezyang/3209/base 2025-12-04T08:57:43.7212413Z * [new branch] gh/ezyang/3209/head -> origin/gh/ezyang/3209/head 2025-12-04T08:57:43.7213530Z * [new branch] gh/ezyang/3209/orig -> origin/gh/ezyang/3209/orig 2025-12-04T08:57:43.7215272Z * [new branch] gh/fadara01/3/base -> origin/gh/fadara01/3/base 2025-12-04T08:57:43.7216386Z * [new branch] gh/fadara01/3/head -> origin/gh/fadara01/3/head 2025-12-04T08:57:43.7217960Z * [new branch] gh/fadara01/3/orig -> origin/gh/fadara01/3/orig 2025-12-04T08:57:43.7219487Z * [new branch] gh/fadara01/5/base -> origin/gh/fadara01/5/base 2025-12-04T08:57:43.7220740Z * [new branch] gh/fadara01/5/head -> origin/gh/fadara01/5/head 2025-12-04T08:57:43.7222144Z * [new branch] gh/fadara01/5/orig -> origin/gh/fadara01/5/orig 2025-12-04T08:57:43.7223589Z * [new branch] gh/fadara01/6/base -> origin/gh/fadara01/6/base 2025-12-04T08:57:43.7224725Z * [new branch] gh/fadara01/6/head -> origin/gh/fadara01/6/head 2025-12-04T08:57:43.7225844Z * [new branch] gh/fadara01/6/orig -> origin/gh/fadara01/6/orig 2025-12-04T08:57:43.7227332Z * [new branch] gh/fadara01/7/base -> origin/gh/fadara01/7/base 2025-12-04T08:57:43.7228559Z * [new branch] gh/fadara01/7/head -> origin/gh/fadara01/7/head 2025-12-04T08:57:43.7229673Z * [new branch] gh/fadara01/7/orig -> origin/gh/fadara01/7/orig 2025-12-04T08:57:43.7231186Z * [new branch] gh/fadara01/8/base -> origin/gh/fadara01/8/base 2025-12-04T08:57:43.7232317Z * [new branch] gh/fadara01/8/head -> origin/gh/fadara01/8/head 2025-12-04T08:57:43.7233523Z * [new branch] gh/fadara01/8/orig -> origin/gh/fadara01/8/orig 2025-12-04T08:57:43.7234976Z * [new branch] gh/fadara01/9/base -> origin/gh/fadara01/9/base 2025-12-04T08:57:43.7236171Z * [new branch] gh/fadara01/9/head -> origin/gh/fadara01/9/head 2025-12-04T08:57:43.7237345Z * [new branch] gh/fadara01/9/orig -> origin/gh/fadara01/9/orig 2025-12-04T08:57:43.7239521Z * [new branch] gh/fduwjj/182/base -> origin/gh/fduwjj/182/base 2025-12-04T08:57:43.7240615Z * [new branch] gh/fduwjj/182/head -> origin/gh/fduwjj/182/head 2025-12-04T08:57:43.7241677Z * [new branch] gh/fduwjj/182/orig -> origin/gh/fduwjj/182/orig 2025-12-04T08:57:43.7243151Z * [new branch] gh/fduwjj/211/base -> origin/gh/fduwjj/211/base 2025-12-04T08:57:43.7244289Z * [new branch] gh/fduwjj/211/head -> origin/gh/fduwjj/211/head 2025-12-04T08:57:43.7245391Z * [new branch] gh/fduwjj/211/orig -> origin/gh/fduwjj/211/orig 2025-12-04T08:57:43.7246858Z * [new branch] gh/fduwjj/212/base -> origin/gh/fduwjj/212/base 2025-12-04T08:57:43.7247958Z * [new branch] gh/fduwjj/212/head -> origin/gh/fduwjj/212/head 2025-12-04T08:57:43.7249110Z * [new branch] gh/fduwjj/212/orig -> origin/gh/fduwjj/212/orig 2025-12-04T08:57:43.7250645Z * [new branch] gh/fduwjj/213/base -> origin/gh/fduwjj/213/base 2025-12-04T08:57:43.7251740Z * [new branch] gh/fduwjj/213/head -> origin/gh/fduwjj/213/head 2025-12-04T08:57:43.7252796Z * [new branch] gh/fduwjj/213/orig -> origin/gh/fduwjj/213/orig 2025-12-04T08:57:43.7254399Z * [new branch] gh/fduwjj/226/base -> origin/gh/fduwjj/226/base 2025-12-04T08:57:43.7255443Z * [new branch] gh/fduwjj/226/head -> origin/gh/fduwjj/226/head 2025-12-04T08:57:43.7256522Z * [new branch] gh/fduwjj/226/orig -> origin/gh/fduwjj/226/orig 2025-12-04T08:57:43.7258452Z * [new branch] gh/fduwjj/229/base -> origin/gh/fduwjj/229/base 2025-12-04T08:57:43.7259485Z * [new branch] gh/fduwjj/229/head -> origin/gh/fduwjj/229/head 2025-12-04T08:57:43.7260573Z * [new branch] gh/fduwjj/229/orig -> origin/gh/fduwjj/229/orig 2025-12-04T08:57:43.7262098Z * [new branch] gh/fduwjj/233/base -> origin/gh/fduwjj/233/base 2025-12-04T08:57:43.7263290Z * [new branch] gh/fduwjj/233/head -> origin/gh/fduwjj/233/head 2025-12-04T08:57:43.7264395Z * [new branch] gh/fduwjj/233/orig -> origin/gh/fduwjj/233/orig 2025-12-04T08:57:43.7265992Z * [new branch] gh/fduwjj/234/base -> origin/gh/fduwjj/234/base 2025-12-04T08:57:43.7267110Z * [new branch] gh/fduwjj/234/head -> origin/gh/fduwjj/234/head 2025-12-04T08:57:43.7268220Z * [new branch] gh/fduwjj/234/orig -> origin/gh/fduwjj/234/orig 2025-12-04T08:57:43.7269784Z * [new branch] gh/fduwjj/235/base -> origin/gh/fduwjj/235/base 2025-12-04T08:57:43.7270917Z * [new branch] gh/fduwjj/235/head -> origin/gh/fduwjj/235/head 2025-12-04T08:57:43.7272043Z * [new branch] gh/fduwjj/235/orig -> origin/gh/fduwjj/235/orig 2025-12-04T08:57:43.7273411Z * [new branch] gh/fduwjj/236/base -> origin/gh/fduwjj/236/base 2025-12-04T08:57:43.7274609Z * [new branch] gh/fduwjj/236/head -> origin/gh/fduwjj/236/head 2025-12-04T08:57:43.7275638Z * [new branch] gh/fduwjj/236/orig -> origin/gh/fduwjj/236/orig 2025-12-04T08:57:43.7276927Z * [new branch] gh/fduwjj/237/base -> origin/gh/fduwjj/237/base 2025-12-04T08:57:43.7278005Z * [new branch] gh/fduwjj/237/head -> origin/gh/fduwjj/237/head 2025-12-04T08:57:43.7279081Z * [new branch] gh/fduwjj/237/orig -> origin/gh/fduwjj/237/orig 2025-12-04T08:57:43.7280648Z * [new branch] gh/fduwjj/238/base -> origin/gh/fduwjj/238/base 2025-12-04T08:57:43.7281801Z * [new branch] gh/fduwjj/238/head -> origin/gh/fduwjj/238/head 2025-12-04T08:57:43.7282981Z * [new branch] gh/fduwjj/238/orig -> origin/gh/fduwjj/238/orig 2025-12-04T08:57:43.7285056Z * [new branch] gh/fduwjj/239/base -> origin/gh/fduwjj/239/base 2025-12-04T08:57:43.7286298Z * [new branch] gh/fduwjj/239/head -> origin/gh/fduwjj/239/head 2025-12-04T08:57:43.7287392Z * [new branch] gh/fduwjj/239/orig -> origin/gh/fduwjj/239/orig 2025-12-04T08:57:43.7289138Z * [new branch] gh/fegin/332/base -> origin/gh/fegin/332/base 2025-12-04T08:57:43.7290239Z * [new branch] gh/fegin/332/head -> origin/gh/fegin/332/head 2025-12-04T08:57:43.7291370Z * [new branch] gh/fegin/332/orig -> origin/gh/fegin/332/orig 2025-12-04T08:57:43.7292836Z * [new branch] gh/fegin/333/base -> origin/gh/fegin/333/base 2025-12-04T08:57:43.7293973Z * [new branch] gh/fegin/333/head -> origin/gh/fegin/333/head 2025-12-04T08:57:43.7295123Z * [new branch] gh/fegin/333/orig -> origin/gh/fegin/333/orig 2025-12-04T08:57:43.7296894Z * [new branch] gh/fegin/334/base -> origin/gh/fegin/334/base 2025-12-04T08:57:43.7298087Z * [new branch] gh/fegin/334/head -> origin/gh/fegin/334/head 2025-12-04T08:57:43.7299376Z * [new branch] gh/fegin/334/orig -> origin/gh/fegin/334/orig 2025-12-04T08:57:43.7300863Z * [new branch] gh/fegin/335/base -> origin/gh/fegin/335/base 2025-12-04T08:57:43.7302013Z * [new branch] gh/fegin/335/head -> origin/gh/fegin/335/head 2025-12-04T08:57:43.7303148Z * [new branch] gh/fegin/335/orig -> origin/gh/fegin/335/orig 2025-12-04T08:57:43.7304866Z * [new branch] gh/fffrog/160/base -> origin/gh/fffrog/160/base 2025-12-04T08:57:43.7305988Z * [new branch] gh/fffrog/160/head -> origin/gh/fffrog/160/head 2025-12-04T08:57:43.7307438Z * [new branch] gh/fffrog/177/base -> origin/gh/fffrog/177/base 2025-12-04T08:57:43.7308673Z * [new branch] gh/fffrog/177/head -> origin/gh/fffrog/177/head 2025-12-04T08:57:43.7309795Z * [new branch] gh/fffrog/177/orig -> origin/gh/fffrog/177/orig 2025-12-04T08:57:43.7311308Z * [new branch] gh/fffrog/178/base -> origin/gh/fffrog/178/base 2025-12-04T08:57:43.7312438Z * [new branch] gh/fffrog/178/head -> origin/gh/fffrog/178/head 2025-12-04T08:57:43.7313553Z * [new branch] gh/fffrog/178/orig -> origin/gh/fffrog/178/orig 2025-12-04T08:57:43.7315013Z * [new branch] gh/fffrog/181/base -> origin/gh/fffrog/181/base 2025-12-04T08:57:43.7316064Z * [new branch] gh/fffrog/181/head -> origin/gh/fffrog/181/head 2025-12-04T08:57:43.7317254Z * [new branch] gh/fffrog/181/orig -> origin/gh/fffrog/181/orig 2025-12-04T08:57:43.7318656Z * [new branch] gh/fffrog/183/base -> origin/gh/fffrog/183/base 2025-12-04T08:57:43.7319702Z * [new branch] gh/fffrog/183/head -> origin/gh/fffrog/183/head 2025-12-04T08:57:43.7321001Z * [new branch] gh/fffrog/183/orig -> origin/gh/fffrog/183/orig 2025-12-04T08:57:43.7323513Z * [new branch] gh/fxdawnn/10/base -> origin/gh/fxdawnn/10/base 2025-12-04T08:57:43.7324604Z * [new branch] gh/fxdawnn/10/head -> origin/gh/fxdawnn/10/head 2025-12-04T08:57:43.7325827Z * [new branch] gh/fxdawnn/10/orig -> origin/gh/fxdawnn/10/orig 2025-12-04T08:57:43.7327498Z * [new branch] gh/fxdawnn/11/base -> origin/gh/fxdawnn/11/base 2025-12-04T08:57:43.7328560Z * [new branch] gh/fxdawnn/11/head -> origin/gh/fxdawnn/11/head 2025-12-04T08:57:43.7329919Z * [new branch] gh/fxdawnn/11/orig -> origin/gh/fxdawnn/11/orig 2025-12-04T08:57:43.7331272Z * [new branch] gh/fxdawnn/12/base -> origin/gh/fxdawnn/12/base 2025-12-04T08:57:43.7332509Z * [new branch] gh/fxdawnn/12/head -> origin/gh/fxdawnn/12/head 2025-12-04T08:57:43.7334151Z * [new branch] gh/fxdawnn/12/orig -> origin/gh/fxdawnn/12/orig 2025-12-04T08:57:43.7335610Z * [new branch] gh/fxdawnn/13/base -> origin/gh/fxdawnn/13/base 2025-12-04T08:57:43.7337197Z * [new branch] gh/fxdawnn/13/head -> origin/gh/fxdawnn/13/head 2025-12-04T08:57:43.7338657Z * [new branch] gh/fxdawnn/13/orig -> origin/gh/fxdawnn/13/orig 2025-12-04T08:57:43.7340252Z * [new branch] gh/fxdawnn/14/base -> origin/gh/fxdawnn/14/base 2025-12-04T08:57:43.7341345Z * [new branch] gh/fxdawnn/14/head -> origin/gh/fxdawnn/14/head 2025-12-04T08:57:43.7342537Z * [new branch] gh/fxdawnn/14/orig -> origin/gh/fxdawnn/14/orig 2025-12-04T08:57:43.7344002Z * [new branch] gh/fxdawnn/15/base -> origin/gh/fxdawnn/15/base 2025-12-04T08:57:43.7345146Z * [new branch] gh/fxdawnn/15/head -> origin/gh/fxdawnn/15/head 2025-12-04T08:57:43.7346258Z * [new branch] gh/fxdawnn/15/orig -> origin/gh/fxdawnn/15/orig 2025-12-04T08:57:43.7347754Z * [new branch] gh/fxdawnn/6/base -> origin/gh/fxdawnn/6/base 2025-12-04T08:57:43.7348982Z * [new branch] gh/fxdawnn/6/head -> origin/gh/fxdawnn/6/head 2025-12-04T08:57:43.7350130Z * [new branch] gh/fxdawnn/6/orig -> origin/gh/fxdawnn/6/orig 2025-12-04T08:57:43.7351737Z * [new branch] gh/fxdawnn/7/base -> origin/gh/fxdawnn/7/base 2025-12-04T08:57:43.7352895Z * [new branch] gh/fxdawnn/7/head -> origin/gh/fxdawnn/7/head 2025-12-04T08:57:43.7353958Z * [new branch] gh/fxdawnn/7/orig -> origin/gh/fxdawnn/7/orig 2025-12-04T08:57:43.7355442Z * [new branch] gh/fxdawnn/9/base -> origin/gh/fxdawnn/9/base 2025-12-04T08:57:43.7356489Z * [new branch] gh/fxdawnn/9/head -> origin/gh/fxdawnn/9/head 2025-12-04T08:57:43.7357634Z * [new branch] gh/fxdawnn/9/orig -> origin/gh/fxdawnn/9/orig 2025-12-04T08:57:43.7359378Z * [new branch] gh/galv/1/base -> origin/gh/galv/1/base 2025-12-04T08:57:43.7360452Z * [new branch] gh/galv/1/head -> origin/gh/galv/1/head 2025-12-04T08:57:43.7361553Z * [new branch] gh/galv/1/orig -> origin/gh/galv/1/orig 2025-12-04T08:57:43.7363055Z * [new branch] gh/galv/2/base -> origin/gh/galv/2/base 2025-12-04T08:57:43.7364129Z * [new branch] gh/galv/2/head -> origin/gh/galv/2/head 2025-12-04T08:57:43.7365286Z * [new branch] gh/galv/2/orig -> origin/gh/galv/2/orig 2025-12-04T08:57:43.7366721Z * [new branch] gh/galv/3/base -> origin/gh/galv/3/base 2025-12-04T08:57:43.7367832Z * [new branch] gh/galv/3/head -> origin/gh/galv/3/head 2025-12-04T08:57:43.7369191Z * [new branch] gh/galv/3/orig -> origin/gh/galv/3/orig 2025-12-04T08:57:43.7371000Z * [new branch] gh/guangyey/134/base -> origin/gh/guangyey/134/base 2025-12-04T08:57:43.7372116Z * [new branch] gh/guangyey/134/head -> origin/gh/guangyey/134/head 2025-12-04T08:57:43.7373212Z * [new branch] gh/guangyey/134/orig -> origin/gh/guangyey/134/orig 2025-12-04T08:57:43.7374693Z * [new branch] gh/guangyey/163/base -> origin/gh/guangyey/163/base 2025-12-04T08:57:43.7375811Z * [new branch] gh/guangyey/163/head -> origin/gh/guangyey/163/head 2025-12-04T08:57:43.7377269Z * [new branch] gh/guangyey/163/orig -> origin/gh/guangyey/163/orig 2025-12-04T08:57:43.7378763Z * [new branch] gh/guangyey/168/base -> origin/gh/guangyey/168/base 2025-12-04T08:57:43.7380296Z * [new branch] gh/guangyey/168/head -> origin/gh/guangyey/168/head 2025-12-04T08:57:43.7381489Z * [new branch] gh/guangyey/168/orig -> origin/gh/guangyey/168/orig 2025-12-04T08:57:43.7382984Z * [new branch] gh/guangyey/169/base -> origin/gh/guangyey/169/base 2025-12-04T08:57:43.7384115Z * [new branch] gh/guangyey/169/head -> origin/gh/guangyey/169/head 2025-12-04T08:57:43.7385255Z * [new branch] gh/guangyey/169/orig -> origin/gh/guangyey/169/orig 2025-12-04T08:57:43.7386866Z * [new branch] gh/guangyey/170/base -> origin/gh/guangyey/170/base 2025-12-04T08:57:43.7387989Z * [new branch] gh/guangyey/170/head -> origin/gh/guangyey/170/head 2025-12-04T08:57:43.7389222Z * [new branch] gh/guangyey/170/orig -> origin/gh/guangyey/170/orig 2025-12-04T08:57:43.7390684Z * [new branch] gh/guangyey/171/base -> origin/gh/guangyey/171/base 2025-12-04T08:57:43.7391743Z * [new branch] gh/guangyey/171/head -> origin/gh/guangyey/171/head 2025-12-04T08:57:43.7392846Z * [new branch] gh/guangyey/171/orig -> origin/gh/guangyey/171/orig 2025-12-04T08:57:43.7394821Z * [new branch] gh/guangyey/178/base -> origin/gh/guangyey/178/base 2025-12-04T08:57:43.7396002Z * [new branch] gh/guangyey/178/head -> origin/gh/guangyey/178/head 2025-12-04T08:57:43.7397045Z * [new branch] gh/guangyey/178/orig -> origin/gh/guangyey/178/orig 2025-12-04T08:57:43.7398461Z * [new branch] gh/guangyey/182/base -> origin/gh/guangyey/182/base 2025-12-04T08:57:43.7399567Z * [new branch] gh/guangyey/182/head -> origin/gh/guangyey/182/head 2025-12-04T08:57:43.7400641Z * [new branch] gh/guangyey/182/orig -> origin/gh/guangyey/182/orig 2025-12-04T08:57:43.7402201Z * [new branch] gh/guangyey/183/base -> origin/gh/guangyey/183/base 2025-12-04T08:57:43.7403309Z * [new branch] gh/guangyey/183/head -> origin/gh/guangyey/183/head 2025-12-04T08:57:43.7404908Z * [new branch] gh/guangyey/183/orig -> origin/gh/guangyey/183/orig 2025-12-04T08:57:43.7406399Z * [new branch] gh/guangyey/185/base -> origin/gh/guangyey/185/base 2025-12-04T08:57:43.7407495Z * [new branch] gh/guangyey/185/head -> origin/gh/guangyey/185/head 2025-12-04T08:57:43.7408601Z * [new branch] gh/guangyey/185/orig -> origin/gh/guangyey/185/orig 2025-12-04T08:57:43.7410030Z * [new branch] gh/guangyey/186/base -> origin/gh/guangyey/186/base 2025-12-04T08:57:43.7411237Z * [new branch] gh/guangyey/186/head -> origin/gh/guangyey/186/head 2025-12-04T08:57:43.7412304Z * [new branch] gh/guangyey/186/orig -> origin/gh/guangyey/186/orig 2025-12-04T08:57:43.7413727Z * [new branch] gh/guangyey/187/base -> origin/gh/guangyey/187/base 2025-12-04T08:57:43.7414894Z * [new branch] gh/guangyey/187/head -> origin/gh/guangyey/187/head 2025-12-04T08:57:43.7415923Z * [new branch] gh/guangyey/187/orig -> origin/gh/guangyey/187/orig 2025-12-04T08:57:43.7417980Z * [new branch] gh/guangyey/188/base -> origin/gh/guangyey/188/base 2025-12-04T08:57:43.7419565Z * [new branch] gh/guangyey/188/head -> origin/gh/guangyey/188/head 2025-12-04T08:57:43.7420966Z * [new branch] gh/guangyey/188/orig -> origin/gh/guangyey/188/orig 2025-12-04T08:57:43.7425120Z * [new branch] gh/guangyey/190/base -> origin/gh/guangyey/190/base 2025-12-04T08:57:43.7427280Z * [new branch] gh/guangyey/190/head -> origin/gh/guangyey/190/head 2025-12-04T08:57:43.7428023Z * [new branch] gh/guangyey/190/orig -> origin/gh/guangyey/190/orig 2025-12-04T08:57:43.7429557Z * [new branch] gh/guangyey/208/base -> origin/gh/guangyey/208/base 2025-12-04T08:57:43.7430570Z * [new branch] gh/guangyey/208/head -> origin/gh/guangyey/208/head 2025-12-04T08:57:43.7431705Z * [new branch] gh/guangyey/208/orig -> origin/gh/guangyey/208/orig 2025-12-04T08:57:43.7433357Z * [new branch] gh/guangyey/228/base -> origin/gh/guangyey/228/base 2025-12-04T08:57:43.7434366Z * [new branch] gh/guangyey/228/head -> origin/gh/guangyey/228/head 2025-12-04T08:57:43.7435483Z * [new branch] gh/guangyey/228/orig -> origin/gh/guangyey/228/orig 2025-12-04T08:57:43.7437603Z * [new branch] gh/guangyey/230/base -> origin/gh/guangyey/230/base 2025-12-04T08:57:43.7438570Z * [new branch] gh/guangyey/230/head -> origin/gh/guangyey/230/head 2025-12-04T08:57:43.7440172Z * [new branch] gh/guangyey/230/orig -> origin/gh/guangyey/230/orig 2025-12-04T08:57:43.7441656Z * [new branch] gh/guangyey/231/base -> origin/gh/guangyey/231/base 2025-12-04T08:57:43.7442594Z * [new branch] gh/guangyey/231/head -> origin/gh/guangyey/231/head 2025-12-04T08:57:43.7443719Z * [new branch] gh/guangyey/231/orig -> origin/gh/guangyey/231/orig 2025-12-04T08:57:43.7445301Z * [new branch] gh/guangyey/232/base -> origin/gh/guangyey/232/base 2025-12-04T08:57:43.7446293Z * [new branch] gh/guangyey/232/head -> origin/gh/guangyey/232/head 2025-12-04T08:57:43.7447400Z * [new branch] gh/guangyey/232/orig -> origin/gh/guangyey/232/orig 2025-12-04T08:57:43.7448959Z * [new branch] gh/guangyey/233/base -> origin/gh/guangyey/233/base 2025-12-04T08:57:43.7449973Z * [new branch] gh/guangyey/233/head -> origin/gh/guangyey/233/head 2025-12-04T08:57:43.7451052Z * [new branch] gh/guangyey/233/orig -> origin/gh/guangyey/233/orig 2025-12-04T08:57:43.7452738Z * [new branch] gh/guangyey/234/base -> origin/gh/guangyey/234/base 2025-12-04T08:57:43.7453732Z * [new branch] gh/guangyey/234/head -> origin/gh/guangyey/234/head 2025-12-04T08:57:43.7454900Z * [new branch] gh/guangyey/234/orig -> origin/gh/guangyey/234/orig 2025-12-04T08:57:43.7456547Z * [new branch] gh/guangyey/235/base -> origin/gh/guangyey/235/base 2025-12-04T08:57:43.7457808Z * [new branch] gh/guangyey/235/head -> origin/gh/guangyey/235/head 2025-12-04T08:57:43.7458919Z * [new branch] gh/guangyey/235/orig -> origin/gh/guangyey/235/orig 2025-12-04T08:57:43.7460528Z * [new branch] gh/guangyey/236/base -> origin/gh/guangyey/236/base 2025-12-04T08:57:43.7461536Z * [new branch] gh/guangyey/236/head -> origin/gh/guangyey/236/head 2025-12-04T08:57:43.7462748Z * [new branch] gh/guangyey/236/orig -> origin/gh/guangyey/236/orig 2025-12-04T08:57:43.7464948Z * [new branch] gh/guangyey/237/base -> origin/gh/guangyey/237/base 2025-12-04T08:57:43.7465572Z * [new branch] gh/guangyey/237/head -> origin/gh/guangyey/237/head 2025-12-04T08:57:43.7466590Z * [new branch] gh/guangyey/237/orig -> origin/gh/guangyey/237/orig 2025-12-04T08:57:43.7468253Z * [new branch] gh/guangyey/238/base -> origin/gh/guangyey/238/base 2025-12-04T08:57:43.7469397Z * [new branch] gh/guangyey/238/head -> origin/gh/guangyey/238/head 2025-12-04T08:57:43.7470972Z * [new branch] gh/guangyey/239/base -> origin/gh/guangyey/239/base 2025-12-04T08:57:43.7471917Z * [new branch] gh/guangyey/239/head -> origin/gh/guangyey/239/head 2025-12-04T08:57:43.7473020Z * [new branch] gh/guangyey/239/orig -> origin/gh/guangyey/239/orig 2025-12-04T08:57:43.7474674Z * [new branch] gh/guangyey/240/base -> origin/gh/guangyey/240/base 2025-12-04T08:57:43.7475665Z * [new branch] gh/guangyey/240/head -> origin/gh/guangyey/240/head 2025-12-04T08:57:43.7476730Z * [new branch] gh/guangyey/240/orig -> origin/gh/guangyey/240/orig 2025-12-04T08:57:43.7478272Z * [new branch] gh/guangyey/241/base -> origin/gh/guangyey/241/base 2025-12-04T08:57:43.7479250Z * [new branch] gh/guangyey/241/head -> origin/gh/guangyey/241/head 2025-12-04T08:57:43.7480372Z * [new branch] gh/guangyey/241/orig -> origin/gh/guangyey/241/orig 2025-12-04T08:57:43.7482374Z * [new branch] gh/guangyey/242/base -> origin/gh/guangyey/242/base 2025-12-04T08:57:43.7483438Z * [new branch] gh/guangyey/242/head -> origin/gh/guangyey/242/head 2025-12-04T08:57:43.7484556Z * [new branch] gh/guangyey/242/orig -> origin/gh/guangyey/242/orig 2025-12-04T08:57:43.7486146Z * [new branch] gh/guangyey/243/base -> origin/gh/guangyey/243/base 2025-12-04T08:57:43.7487348Z * [new branch] gh/guangyey/243/head -> origin/gh/guangyey/243/head 2025-12-04T08:57:43.7488460Z * [new branch] gh/guangyey/243/orig -> origin/gh/guangyey/243/orig 2025-12-04T08:57:43.7490094Z * [new branch] gh/guangyey/244/base -> origin/gh/guangyey/244/base 2025-12-04T08:57:43.7491123Z * [new branch] gh/guangyey/244/head -> origin/gh/guangyey/244/head 2025-12-04T08:57:43.7492278Z * [new branch] gh/guangyey/244/orig -> origin/gh/guangyey/244/orig 2025-12-04T08:57:43.7494379Z * [new branch] gh/guangyey/245/base -> origin/gh/guangyey/245/base 2025-12-04T08:57:43.7495429Z * [new branch] gh/guangyey/245/head -> origin/gh/guangyey/245/head 2025-12-04T08:57:43.7496634Z * [new branch] gh/guangyey/245/orig -> origin/gh/guangyey/245/orig 2025-12-04T08:57:43.7498530Z * [new branch] gh/guangyey/246/base -> origin/gh/guangyey/246/base 2025-12-04T08:57:43.7499599Z * [new branch] gh/guangyey/246/head -> origin/gh/guangyey/246/head 2025-12-04T08:57:43.7500703Z * [new branch] gh/guangyey/246/orig -> origin/gh/guangyey/246/orig 2025-12-04T08:57:43.7502395Z * [new branch] gh/guangyey/247/base -> origin/gh/guangyey/247/base 2025-12-04T08:57:43.7503404Z * [new branch] gh/guangyey/247/head -> origin/gh/guangyey/247/head 2025-12-04T08:57:43.7504533Z * [new branch] gh/guangyey/247/orig -> origin/gh/guangyey/247/orig 2025-12-04T08:57:43.7506207Z * [new branch] gh/guangyey/248/base -> origin/gh/guangyey/248/base 2025-12-04T08:57:43.7507186Z * [new branch] gh/guangyey/248/head -> origin/gh/guangyey/248/head 2025-12-04T08:57:43.7508307Z * [new branch] gh/guangyey/248/orig -> origin/gh/guangyey/248/orig 2025-12-04T08:57:43.7510098Z * [new branch] gh/guangyey/249/base -> origin/gh/guangyey/249/base 2025-12-04T08:57:43.7510988Z * [new branch] gh/guangyey/249/head -> origin/gh/guangyey/249/head 2025-12-04T08:57:43.7512116Z * [new branch] gh/guangyey/249/orig -> origin/gh/guangyey/249/orig 2025-12-04T08:57:43.7513683Z * [new branch] gh/guangyey/250/base -> origin/gh/guangyey/250/base 2025-12-04T08:57:43.7514833Z * [new branch] gh/guangyey/250/head -> origin/gh/guangyey/250/head 2025-12-04T08:57:43.7515929Z * [new branch] gh/guangyey/250/orig -> origin/gh/guangyey/250/orig 2025-12-04T08:57:43.7517412Z * [new branch] gh/guangyey/251/base -> origin/gh/guangyey/251/base 2025-12-04T08:57:43.7518451Z * [new branch] gh/guangyey/251/head -> origin/gh/guangyey/251/head 2025-12-04T08:57:43.7519581Z * [new branch] gh/guangyey/251/orig -> origin/gh/guangyey/251/orig 2025-12-04T08:57:43.7521476Z * [new branch] gh/guangyey/252/base -> origin/gh/guangyey/252/base 2025-12-04T08:57:43.7522579Z * [new branch] gh/guangyey/252/head -> origin/gh/guangyey/252/head 2025-12-04T08:57:43.7523737Z * [new branch] gh/guangyey/252/orig -> origin/gh/guangyey/252/orig 2025-12-04T08:57:43.7525339Z * [new branch] gh/guangyey/253/base -> origin/gh/guangyey/253/base 2025-12-04T08:57:43.7526377Z * [new branch] gh/guangyey/253/head -> origin/gh/guangyey/253/head 2025-12-04T08:57:43.7527484Z * [new branch] gh/guangyey/253/orig -> origin/gh/guangyey/253/orig 2025-12-04T08:57:43.7529102Z * [new branch] gh/guangyey/254/base -> origin/gh/guangyey/254/base 2025-12-04T08:57:43.7530209Z * [new branch] gh/guangyey/254/head -> origin/gh/guangyey/254/head 2025-12-04T08:57:43.7531350Z * [new branch] gh/guangyey/254/orig -> origin/gh/guangyey/254/orig 2025-12-04T08:57:43.7532952Z * [new branch] gh/guangyey/255/base -> origin/gh/guangyey/255/base 2025-12-04T08:57:43.7534029Z * [new branch] gh/guangyey/255/head -> origin/gh/guangyey/255/head 2025-12-04T08:57:43.7535182Z * [new branch] gh/guangyey/255/orig -> origin/gh/guangyey/255/orig 2025-12-04T08:57:43.7537542Z * [new branch] gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base 2025-12-04T08:57:43.7538619Z * [new branch] gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head 2025-12-04T08:57:43.7539764Z * [new branch] gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig 2025-12-04T08:57:43.7541048Z * [new branch] gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base 2025-12-04T08:57:43.7542095Z * [new branch] gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head 2025-12-04T08:57:43.7543350Z * [new branch] gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig 2025-12-04T08:57:43.7544865Z * [new branch] gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base 2025-12-04T08:57:43.7545967Z * [new branch] gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head 2025-12-04T08:57:43.7549168Z * [new branch] gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig 2025-12-04T08:57:43.7550655Z * [new branch] gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base 2025-12-04T08:57:43.7551726Z * [new branch] gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head 2025-12-04T08:57:43.7552854Z * [new branch] gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig 2025-12-04T08:57:43.7554275Z * [new branch] gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base 2025-12-04T08:57:43.7555662Z * [new branch] gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head 2025-12-04T08:57:43.7556624Z * [new branch] gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig 2025-12-04T08:57:43.7558121Z * [new branch] gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base 2025-12-04T08:57:43.7559186Z * [new branch] gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head 2025-12-04T08:57:43.7561218Z * [new branch] gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig 2025-12-04T08:57:43.7562317Z * [new branch] gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base 2025-12-04T08:57:43.7563240Z * [new branch] gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head 2025-12-04T08:57:43.7564273Z * [new branch] gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig 2025-12-04T08:57:43.7565893Z * [new branch] gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base 2025-12-04T08:57:43.7566936Z * [new branch] gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head 2025-12-04T08:57:43.7567872Z * [new branch] gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig 2025-12-04T08:57:43.7569477Z * [new branch] gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base 2025-12-04T08:57:43.7570416Z * [new branch] gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head 2025-12-04T08:57:43.7571824Z * [new branch] gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig 2025-12-04T08:57:43.7573339Z * [new branch] gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base 2025-12-04T08:57:43.7574343Z * [new branch] gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head 2025-12-04T08:57:43.7575280Z * [new branch] gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig 2025-12-04T08:57:43.7577194Z * [new branch] gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base 2025-12-04T08:57:43.7578362Z * [new branch] gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head 2025-12-04T08:57:43.7579440Z * [new branch] gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig 2025-12-04T08:57:43.7581050Z * [new branch] gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base 2025-12-04T08:57:43.7582054Z * [new branch] gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head 2025-12-04T08:57:43.7583193Z * [new branch] gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig 2025-12-04T08:57:43.7584754Z * [new branch] gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base 2025-12-04T08:57:43.7585761Z * [new branch] gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head 2025-12-04T08:57:43.7586883Z * [new branch] gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig 2025-12-04T08:57:43.7588392Z * [new branch] gh/guilhermeleobas/247/base -> origin/gh/guilhermeleobas/247/base 2025-12-04T08:57:43.7589550Z * [new branch] gh/guilhermeleobas/247/head -> origin/gh/guilhermeleobas/247/head 2025-12-04T08:57:43.7592796Z * [new branch] gh/guilhermeleobas/247/orig -> origin/gh/guilhermeleobas/247/orig 2025-12-04T08:57:43.7593943Z * [new branch] gh/guilhermeleobas/248/base -> origin/gh/guilhermeleobas/248/base 2025-12-04T08:57:43.7594475Z * [new branch] gh/guilhermeleobas/248/head -> origin/gh/guilhermeleobas/248/head 2025-12-04T08:57:43.7595019Z * [new branch] gh/guilhermeleobas/248/orig -> origin/gh/guilhermeleobas/248/orig 2025-12-04T08:57:43.7595848Z * [new branch] gh/guilhermeleobas/250/base -> origin/gh/guilhermeleobas/250/base 2025-12-04T08:57:43.7597109Z * [new branch] gh/guilhermeleobas/250/head -> origin/gh/guilhermeleobas/250/head 2025-12-04T08:57:43.7598104Z * [new branch] gh/guilhermeleobas/250/orig -> origin/gh/guilhermeleobas/250/orig 2025-12-04T08:57:43.7600051Z * [new branch] gh/guilhermeleobas/253/base -> origin/gh/guilhermeleobas/253/base 2025-12-04T08:57:43.7600994Z * [new branch] gh/guilhermeleobas/253/head -> origin/gh/guilhermeleobas/253/head 2025-12-04T08:57:43.7602136Z * [new branch] gh/guilhermeleobas/253/orig -> origin/gh/guilhermeleobas/253/orig 2025-12-04T08:57:43.7603765Z * [new branch] gh/guilhermeleobas/254/base -> origin/gh/guilhermeleobas/254/base 2025-12-04T08:57:43.7604729Z * [new branch] gh/guilhermeleobas/254/head -> origin/gh/guilhermeleobas/254/head 2025-12-04T08:57:43.7605824Z * [new branch] gh/guilhermeleobas/254/orig -> origin/gh/guilhermeleobas/254/orig 2025-12-04T08:57:43.7607541Z * [new branch] gh/guilhermeleobas/255/base -> origin/gh/guilhermeleobas/255/base 2025-12-04T08:57:43.7608474Z * [new branch] gh/guilhermeleobas/255/head -> origin/gh/guilhermeleobas/255/head 2025-12-04T08:57:43.7609552Z * [new branch] gh/guilhermeleobas/255/orig -> origin/gh/guilhermeleobas/255/orig 2025-12-04T08:57:43.7611179Z * [new branch] gh/guilhermeleobas/256/base -> origin/gh/guilhermeleobas/256/base 2025-12-04T08:57:43.7612197Z * [new branch] gh/guilhermeleobas/256/head -> origin/gh/guilhermeleobas/256/head 2025-12-04T08:57:43.7613304Z * [new branch] gh/guilhermeleobas/256/orig -> origin/gh/guilhermeleobas/256/orig 2025-12-04T08:57:43.7614887Z * [new branch] gh/guilhermeleobas/257/base -> origin/gh/guilhermeleobas/257/base 2025-12-04T08:57:43.7616072Z * [new branch] gh/guilhermeleobas/257/head -> origin/gh/guilhermeleobas/257/head 2025-12-04T08:57:43.7617560Z * [new branch] gh/guilhermeleobas/257/orig -> origin/gh/guilhermeleobas/257/orig 2025-12-04T08:57:43.7619413Z * [new branch] gh/guilhermeleobas/258/base -> origin/gh/guilhermeleobas/258/base 2025-12-04T08:57:43.7620412Z * [new branch] gh/guilhermeleobas/258/head -> origin/gh/guilhermeleobas/258/head 2025-12-04T08:57:43.7621859Z * [new branch] gh/guilhermeleobas/258/orig -> origin/gh/guilhermeleobas/258/orig 2025-12-04T08:57:43.7623834Z * [new branch] gh/guilhermeleobas/259/base -> origin/gh/guilhermeleobas/259/base 2025-12-04T08:57:43.7624628Z * [new branch] gh/guilhermeleobas/259/head -> origin/gh/guilhermeleobas/259/head 2025-12-04T08:57:43.7625866Z * [new branch] gh/guilhermeleobas/259/orig -> origin/gh/guilhermeleobas/259/orig 2025-12-04T08:57:43.7627462Z * [new branch] gh/guilhermeleobas/260/base -> origin/gh/guilhermeleobas/260/base 2025-12-04T08:57:43.7628541Z * [new branch] gh/guilhermeleobas/260/head -> origin/gh/guilhermeleobas/260/head 2025-12-04T08:57:43.7629677Z * [new branch] gh/guilhermeleobas/260/orig -> origin/gh/guilhermeleobas/260/orig 2025-12-04T08:57:43.7631238Z * [new branch] gh/guilhermeleobas/261/base -> origin/gh/guilhermeleobas/261/base 2025-12-04T08:57:43.7632262Z * [new branch] gh/guilhermeleobas/261/head -> origin/gh/guilhermeleobas/261/head 2025-12-04T08:57:43.7634098Z * [new branch] gh/guilhermeleobas/261/orig -> origin/gh/guilhermeleobas/261/orig 2025-12-04T08:57:43.7635611Z * [new branch] gh/guilhermeleobas/262/base -> origin/gh/guilhermeleobas/262/base 2025-12-04T08:57:43.7636734Z * [new branch] gh/guilhermeleobas/262/head -> origin/gh/guilhermeleobas/262/head 2025-12-04T08:57:43.7637791Z * [new branch] gh/guilhermeleobas/262/orig -> origin/gh/guilhermeleobas/262/orig 2025-12-04T08:57:43.7639474Z * [new branch] gh/guilhermeleobas/263/base -> origin/gh/guilhermeleobas/263/base 2025-12-04T08:57:43.7640715Z * [new branch] gh/guilhermeleobas/263/head -> origin/gh/guilhermeleobas/263/head 2025-12-04T08:57:43.7641597Z * [new branch] gh/guilhermeleobas/263/orig -> origin/gh/guilhermeleobas/263/orig 2025-12-04T08:57:43.7643167Z * [new branch] gh/guilhermeleobas/264/base -> origin/gh/guilhermeleobas/264/base 2025-12-04T08:57:43.7644246Z * [new branch] gh/guilhermeleobas/264/head -> origin/gh/guilhermeleobas/264/head 2025-12-04T08:57:43.7645265Z * [new branch] gh/guilhermeleobas/264/orig -> origin/gh/guilhermeleobas/264/orig 2025-12-04T08:57:43.7646902Z * [new branch] gh/guilhermeleobas/265/base -> origin/gh/guilhermeleobas/265/base 2025-12-04T08:57:43.7647884Z * [new branch] gh/guilhermeleobas/265/head -> origin/gh/guilhermeleobas/265/head 2025-12-04T08:57:43.7648970Z * [new branch] gh/guilhermeleobas/265/orig -> origin/gh/guilhermeleobas/265/orig 2025-12-04T08:57:43.7650621Z * [new branch] gh/guilhermeleobas/266/base -> origin/gh/guilhermeleobas/266/base 2025-12-04T08:57:43.7651553Z * [new branch] gh/guilhermeleobas/266/head -> origin/gh/guilhermeleobas/266/head 2025-12-04T08:57:43.7652673Z * [new branch] gh/guilhermeleobas/266/orig -> origin/gh/guilhermeleobas/266/orig 2025-12-04T08:57:43.7654420Z * [new branch] gh/guilhermeleobas/267/base -> origin/gh/guilhermeleobas/267/base 2025-12-04T08:57:43.7655784Z * [new branch] gh/guilhermeleobas/267/head -> origin/gh/guilhermeleobas/267/head 2025-12-04T08:57:43.7656817Z * [new branch] gh/guilhermeleobas/267/orig -> origin/gh/guilhermeleobas/267/orig 2025-12-04T08:57:43.7658951Z * [new branch] gh/hameerabbasi/1/base -> origin/gh/hameerabbasi/1/base 2025-12-04T08:57:43.7659877Z * [new branch] gh/hameerabbasi/1/head -> origin/gh/hameerabbasi/1/head 2025-12-04T08:57:43.7661388Z * [new branch] gh/hameerabbasi/2/base -> origin/gh/hameerabbasi/2/base 2025-12-04T08:57:43.7662369Z * [new branch] gh/hameerabbasi/2/head -> origin/gh/hameerabbasi/2/head 2025-12-04T08:57:43.7663409Z * [new branch] gh/hameerabbasi/2/orig -> origin/gh/hameerabbasi/2/orig 2025-12-04T08:57:43.7664829Z * [new branch] gh/hameerabbasi/3/base -> origin/gh/hameerabbasi/3/base 2025-12-04T08:57:43.7665946Z * [new branch] gh/hameerabbasi/3/head -> origin/gh/hameerabbasi/3/head 2025-12-04T08:57:43.7667717Z * [new branch] gh/hameerabbasi/3/orig -> origin/gh/hameerabbasi/3/orig 2025-12-04T08:57:43.7669350Z * [new branch] gh/hameerabbasi/4/base -> origin/gh/hameerabbasi/4/base 2025-12-04T08:57:43.7670538Z * [new branch] gh/hameerabbasi/4/head -> origin/gh/hameerabbasi/4/head 2025-12-04T08:57:43.7671422Z * [new branch] gh/hameerabbasi/4/orig -> origin/gh/hameerabbasi/4/orig 2025-12-04T08:57:43.7673556Z * [new branch] gh/huydhn/1/next -> origin/gh/huydhn/1/next 2025-12-04T08:57:43.7674924Z * [new branch] gh/huydhn/2/next -> origin/gh/huydhn/2/next 2025-12-04T08:57:43.7676329Z * [new branch] gh/huydhn/3/next -> origin/gh/huydhn/3/next 2025-12-04T08:57:43.7677766Z * [new branch] gh/huydhn/4/next -> origin/gh/huydhn/4/next 2025-12-04T08:57:43.7679176Z * [new branch] gh/huydhn/5/next -> origin/gh/huydhn/5/next 2025-12-04T08:57:43.7680578Z * [new branch] gh/huydhn/6/next -> origin/gh/huydhn/6/next 2025-12-04T08:57:43.7682411Z * [new branch] gh/int3/97/base -> origin/gh/int3/97/base 2025-12-04T08:57:43.7683489Z * [new branch] gh/int3/97/head -> origin/gh/int3/97/head 2025-12-04T08:57:43.7685288Z * [new branch] gh/isuruf/101/base -> origin/gh/isuruf/101/base 2025-12-04T08:57:43.7686477Z * [new branch] gh/isuruf/101/head -> origin/gh/isuruf/101/head 2025-12-04T08:57:43.7687969Z * [new branch] gh/isuruf/146/base -> origin/gh/isuruf/146/base 2025-12-04T08:57:43.7689053Z * [new branch] gh/isuruf/146/head -> origin/gh/isuruf/146/head 2025-12-04T08:57:43.7690130Z * [new branch] gh/isuruf/146/orig -> origin/gh/isuruf/146/orig 2025-12-04T08:57:43.7691586Z * [new branch] gh/isuruf/158/base -> origin/gh/isuruf/158/base 2025-12-04T08:57:43.7692620Z * [new branch] gh/isuruf/158/head -> origin/gh/isuruf/158/head 2025-12-04T08:57:43.7694080Z * [new branch] gh/isuruf/159/base -> origin/gh/isuruf/159/base 2025-12-04T08:57:43.7695169Z * [new branch] gh/isuruf/159/head -> origin/gh/isuruf/159/head 2025-12-04T08:57:43.7696869Z * [new branch] gh/isuruf/160/base -> origin/gh/isuruf/160/base 2025-12-04T08:57:43.7698066Z * [new branch] gh/isuruf/160/head -> origin/gh/isuruf/160/head 2025-12-04T08:57:43.7699184Z * [new branch] gh/isuruf/160/orig -> origin/gh/isuruf/160/orig 2025-12-04T08:57:43.7701213Z * [new branch] gh/isuruf/81/base -> origin/gh/isuruf/81/base 2025-12-04T08:57:43.7702328Z * [new branch] gh/isuruf/81/head -> origin/gh/isuruf/81/head 2025-12-04T08:57:43.7703433Z * [new branch] gh/isuruf/81/orig -> origin/gh/isuruf/81/orig 2025-12-04T08:57:43.7705221Z * [new branch] gh/jamesjwu/176/base -> origin/gh/jamesjwu/176/base 2025-12-04T08:57:43.7706366Z * [new branch] gh/jamesjwu/176/head -> origin/gh/jamesjwu/176/head 2025-12-04T08:57:43.7707489Z * [new branch] gh/jamesjwu/176/orig -> origin/gh/jamesjwu/176/orig 2025-12-04T08:57:43.7709154Z * [new branch] gh/jamesjwu/187/base -> origin/gh/jamesjwu/187/base 2025-12-04T08:57:43.7710295Z * [new branch] gh/jamesjwu/187/head -> origin/gh/jamesjwu/187/head 2025-12-04T08:57:43.7711399Z * [new branch] gh/jamesjwu/187/orig -> origin/gh/jamesjwu/187/orig 2025-12-04T08:57:43.7712838Z * [new branch] gh/jamesjwu/196/base -> origin/gh/jamesjwu/196/base 2025-12-04T08:57:43.7713914Z * [new branch] gh/jamesjwu/196/head -> origin/gh/jamesjwu/196/head 2025-12-04T08:57:43.7714992Z * [new branch] gh/jamesjwu/196/orig -> origin/gh/jamesjwu/196/orig 2025-12-04T08:57:43.7716426Z * [new branch] gh/jamesjwu/198/base -> origin/gh/jamesjwu/198/base 2025-12-04T08:57:43.7717468Z * [new branch] gh/jamesjwu/198/head -> origin/gh/jamesjwu/198/head 2025-12-04T08:57:43.7718559Z * [new branch] gh/jamesjwu/198/orig -> origin/gh/jamesjwu/198/orig 2025-12-04T08:57:43.7719980Z * [new branch] gh/jamesjwu/207/base -> origin/gh/jamesjwu/207/base 2025-12-04T08:57:43.7721675Z * [new branch] gh/jamesjwu/207/head -> origin/gh/jamesjwu/207/head 2025-12-04T08:57:43.7722815Z * [new branch] gh/jamesjwu/207/orig -> origin/gh/jamesjwu/207/orig 2025-12-04T08:57:43.7724562Z * [new branch] gh/jamesjwu/208/base -> origin/gh/jamesjwu/208/base 2025-12-04T08:57:43.7725692Z * [new branch] gh/jamesjwu/208/head -> origin/gh/jamesjwu/208/head 2025-12-04T08:57:43.7726800Z * [new branch] gh/jamesjwu/208/orig -> origin/gh/jamesjwu/208/orig 2025-12-04T08:57:43.7728364Z * [new branch] gh/jamesjwu/52/base -> origin/gh/jamesjwu/52/base 2025-12-04T08:57:43.7729488Z * [new branch] gh/jamesjwu/52/head -> origin/gh/jamesjwu/52/head 2025-12-04T08:57:43.7730864Z * [new branch] gh/jamesjwu/53/base -> origin/gh/jamesjwu/53/base 2025-12-04T08:57:43.7731919Z * [new branch] gh/jamesjwu/53/head -> origin/gh/jamesjwu/53/head 2025-12-04T08:57:43.7733367Z * [new branch] gh/jamesjwu/54/base -> origin/gh/jamesjwu/54/base 2025-12-04T08:57:43.7734340Z * [new branch] gh/jamesjwu/54/head -> origin/gh/jamesjwu/54/head 2025-12-04T08:57:43.7735681Z * [new branch] gh/jamesjwu/55/base -> origin/gh/jamesjwu/55/base 2025-12-04T08:57:43.7736949Z * [new branch] gh/jamesjwu/55/head -> origin/gh/jamesjwu/55/head 2025-12-04T08:57:43.7738474Z * [new branch] gh/jamesjwu/56/base -> origin/gh/jamesjwu/56/base 2025-12-04T08:57:43.7739549Z * [new branch] gh/jamesjwu/56/head -> origin/gh/jamesjwu/56/head 2025-12-04T08:57:43.7740966Z * [new branch] gh/jamesjwu/57/base -> origin/gh/jamesjwu/57/base 2025-12-04T08:57:43.7742023Z * [new branch] gh/jamesjwu/57/head -> origin/gh/jamesjwu/57/head 2025-12-04T08:57:43.7743364Z * [new branch] gh/jamesjwu/58/base -> origin/gh/jamesjwu/58/base 2025-12-04T08:57:43.7744451Z * [new branch] gh/jamesjwu/58/head -> origin/gh/jamesjwu/58/head 2025-12-04T08:57:43.7745770Z * [new branch] gh/jamesjwu/59/base -> origin/gh/jamesjwu/59/base 2025-12-04T08:57:43.7746803Z * [new branch] gh/jamesjwu/59/head -> origin/gh/jamesjwu/59/head 2025-12-04T08:57:43.7748668Z * [new branch] gh/jamesjwu/60/base -> origin/gh/jamesjwu/60/base 2025-12-04T08:57:43.7749900Z * [new branch] gh/jamesjwu/60/head -> origin/gh/jamesjwu/60/head 2025-12-04T08:57:43.7751198Z * [new branch] gh/jamesjwu/61/base -> origin/gh/jamesjwu/61/base 2025-12-04T08:57:43.7752354Z * [new branch] gh/jamesjwu/61/head -> origin/gh/jamesjwu/61/head 2025-12-04T08:57:43.7753747Z * [new branch] gh/jamesjwu/62/base -> origin/gh/jamesjwu/62/base 2025-12-04T08:57:43.7754783Z * [new branch] gh/jamesjwu/62/head -> origin/gh/jamesjwu/62/head 2025-12-04T08:57:43.7756041Z * [new branch] gh/jamesjwu/63/base -> origin/gh/jamesjwu/63/base 2025-12-04T08:57:43.7757124Z * [new branch] gh/jamesjwu/63/head -> origin/gh/jamesjwu/63/head 2025-12-04T08:57:43.7759041Z * [new branch] gh/jamesjwu/64/base -> origin/gh/jamesjwu/64/base 2025-12-04T08:57:43.7760122Z * [new branch] gh/jamesjwu/64/head -> origin/gh/jamesjwu/64/head 2025-12-04T08:57:43.7761915Z * [new branch] gh/jamesjwu/65/base -> origin/gh/jamesjwu/65/base 2025-12-04T08:57:43.7762926Z * [new branch] gh/jamesjwu/65/head -> origin/gh/jamesjwu/65/head 2025-12-04T08:57:43.7764790Z * [new branch] gh/janeyx99/165/base -> origin/gh/janeyx99/165/base 2025-12-04T08:57:43.7766017Z * [new branch] gh/janeyx99/165/head -> origin/gh/janeyx99/165/head 2025-12-04T08:57:43.7767124Z * [new branch] gh/janeyx99/165/orig -> origin/gh/janeyx99/165/orig 2025-12-04T08:57:43.7768442Z * [new branch] gh/janeyx99/201/base -> origin/gh/janeyx99/201/base 2025-12-04T08:57:43.7769534Z * [new branch] gh/janeyx99/201/head -> origin/gh/janeyx99/201/head 2025-12-04T08:57:43.7770607Z * [new branch] gh/janeyx99/201/orig -> origin/gh/janeyx99/201/orig 2025-12-04T08:57:43.7772340Z * [new branch] gh/janeyx99/225/base -> origin/gh/janeyx99/225/base 2025-12-04T08:57:43.7773459Z * [new branch] gh/janeyx99/225/head -> origin/gh/janeyx99/225/head 2025-12-04T08:57:43.7774553Z * [new branch] gh/janeyx99/225/orig -> origin/gh/janeyx99/225/orig 2025-12-04T08:57:43.7776012Z * [new branch] gh/janeyx99/299/base -> origin/gh/janeyx99/299/base 2025-12-04T08:57:43.7777494Z * [new branch] gh/janeyx99/299/head -> origin/gh/janeyx99/299/head 2025-12-04T08:57:43.7778716Z * [new branch] gh/janeyx99/299/orig -> origin/gh/janeyx99/299/orig 2025-12-04T08:57:43.7780560Z * [new branch] gh/janeyx99/302/base -> origin/gh/janeyx99/302/base 2025-12-04T08:57:43.7781840Z * [new branch] gh/janeyx99/302/head -> origin/gh/janeyx99/302/head 2025-12-04T08:57:43.7783170Z * [new branch] gh/janeyx99/303/base -> origin/gh/janeyx99/303/base 2025-12-04T08:57:43.7784278Z * [new branch] gh/janeyx99/303/head -> origin/gh/janeyx99/303/head 2025-12-04T08:57:43.7785735Z * [new branch] gh/janeyx99/305/base -> origin/gh/janeyx99/305/base 2025-12-04T08:57:43.7786888Z * [new branch] gh/janeyx99/305/head -> origin/gh/janeyx99/305/head 2025-12-04T08:57:43.7788203Z * [new branch] gh/janeyx99/306/base -> origin/gh/janeyx99/306/base 2025-12-04T08:57:43.7789516Z * [new branch] gh/janeyx99/306/head -> origin/gh/janeyx99/306/head 2025-12-04T08:57:43.7791081Z * [new branch] gh/janeyx99/314/base -> origin/gh/janeyx99/314/base 2025-12-04T08:57:43.7792138Z * [new branch] gh/janeyx99/314/head -> origin/gh/janeyx99/314/head 2025-12-04T08:57:43.7793224Z * [new branch] gh/janeyx99/314/orig -> origin/gh/janeyx99/314/orig 2025-12-04T08:57:43.7794853Z * [new branch] gh/janeyx99/315/base -> origin/gh/janeyx99/315/base 2025-12-04T08:57:43.7795903Z * [new branch] gh/janeyx99/315/head -> origin/gh/janeyx99/315/head 2025-12-04T08:57:43.7796871Z * [new branch] gh/janeyx99/315/orig -> origin/gh/janeyx99/315/orig 2025-12-04T08:57:43.7798362Z * [new branch] gh/janeyx99/316/base -> origin/gh/janeyx99/316/base 2025-12-04T08:57:43.7799485Z * [new branch] gh/janeyx99/316/head -> origin/gh/janeyx99/316/head 2025-12-04T08:57:43.7800585Z * [new branch] gh/janeyx99/316/orig -> origin/gh/janeyx99/316/orig 2025-12-04T08:57:43.7802200Z * [new branch] gh/janeyx99/317/base -> origin/gh/janeyx99/317/base 2025-12-04T08:57:43.7803294Z * [new branch] gh/janeyx99/317/head -> origin/gh/janeyx99/317/head 2025-12-04T08:57:43.7804357Z * [new branch] gh/janeyx99/317/orig -> origin/gh/janeyx99/317/orig 2025-12-04T08:57:43.7805874Z * [new branch] gh/janeyx99/325/base -> origin/gh/janeyx99/325/base 2025-12-04T08:57:43.7807005Z * [new branch] gh/janeyx99/325/head -> origin/gh/janeyx99/325/head 2025-12-04T08:57:43.7808120Z * [new branch] gh/janeyx99/325/orig -> origin/gh/janeyx99/325/orig 2025-12-04T08:57:43.7809604Z * [new branch] gh/janeyx99/327/base -> origin/gh/janeyx99/327/base 2025-12-04T08:57:43.7810815Z * [new branch] gh/janeyx99/327/head -> origin/gh/janeyx99/327/head 2025-12-04T08:57:43.7811900Z * [new branch] gh/janeyx99/327/orig -> origin/gh/janeyx99/327/orig 2025-12-04T08:57:43.7813376Z * [new branch] gh/janeyx99/328/base -> origin/gh/janeyx99/328/base 2025-12-04T08:57:43.7814501Z * [new branch] gh/janeyx99/328/head -> origin/gh/janeyx99/328/head 2025-12-04T08:57:43.7815657Z * [new branch] gh/janeyx99/328/orig -> origin/gh/janeyx99/328/orig 2025-12-04T08:57:43.7817322Z * [new branch] gh/janeyx99/329/base -> origin/gh/janeyx99/329/base 2025-12-04T08:57:43.7818494Z * [new branch] gh/janeyx99/329/head -> origin/gh/janeyx99/329/head 2025-12-04T08:57:43.7819767Z * [new branch] gh/janeyx99/329/orig -> origin/gh/janeyx99/329/orig 2025-12-04T08:57:43.7822571Z * [new branch] gh/janeyx99/330/base -> origin/gh/janeyx99/330/base 2025-12-04T08:57:43.7823804Z * [new branch] gh/janeyx99/330/head -> origin/gh/janeyx99/330/head 2025-12-04T08:57:43.7825064Z * [new branch] gh/janeyx99/330/orig -> origin/gh/janeyx99/330/orig 2025-12-04T08:57:43.7826492Z * [new branch] gh/janeyx99/331/base -> origin/gh/janeyx99/331/base 2025-12-04T08:57:43.7827706Z * [new branch] gh/janeyx99/331/head -> origin/gh/janeyx99/331/head 2025-12-04T08:57:43.7828819Z * [new branch] gh/janeyx99/331/orig -> origin/gh/janeyx99/331/orig 2025-12-04T08:57:43.7830505Z * [new branch] gh/janeyx99/332/base -> origin/gh/janeyx99/332/base 2025-12-04T08:57:43.7831496Z * [new branch] gh/janeyx99/332/head -> origin/gh/janeyx99/332/head 2025-12-04T08:57:43.7832631Z * [new branch] gh/janeyx99/332/orig -> origin/gh/janeyx99/332/orig 2025-12-04T08:57:43.7834116Z * [new branch] gh/janeyx99/333/base -> origin/gh/janeyx99/333/base 2025-12-04T08:57:43.7835190Z * [new branch] gh/janeyx99/333/head -> origin/gh/janeyx99/333/head 2025-12-04T08:57:43.7836276Z * [new branch] gh/janeyx99/333/orig -> origin/gh/janeyx99/333/orig 2025-12-04T08:57:43.7837827Z * [new branch] gh/janeyx99/88/base -> origin/gh/janeyx99/88/base 2025-12-04T08:57:43.7838929Z * [new branch] gh/janeyx99/88/head -> origin/gh/janeyx99/88/head 2025-12-04T08:57:43.7840012Z * [new branch] gh/janeyx99/88/orig -> origin/gh/janeyx99/88/orig 2025-12-04T08:57:43.7842081Z * [new branch] gh/jansel/360/base -> origin/gh/jansel/360/base 2025-12-04T08:57:43.7842911Z * [new branch] gh/jansel/360/head -> origin/gh/jansel/360/head 2025-12-04T08:57:43.7844354Z * [new branch] gh/jansel/451/base -> origin/gh/jansel/451/base 2025-12-04T08:57:43.7845443Z * [new branch] gh/jansel/451/head -> origin/gh/jansel/451/head 2025-12-04T08:57:43.7846597Z * [new branch] gh/jansel/451/orig -> origin/gh/jansel/451/orig 2025-12-04T08:57:43.7847978Z * [new branch] gh/jansel/462/base -> origin/gh/jansel/462/base 2025-12-04T08:57:43.7849028Z * [new branch] gh/jansel/462/head -> origin/gh/jansel/462/head 2025-12-04T08:57:43.7850109Z * [new branch] gh/jansel/462/orig -> origin/gh/jansel/462/orig 2025-12-04T08:57:43.7851580Z * [new branch] gh/jansel/533/base -> origin/gh/jansel/533/base 2025-12-04T08:57:43.7852658Z * [new branch] gh/jansel/533/head -> origin/gh/jansel/533/head 2025-12-04T08:57:43.7853731Z * [new branch] gh/jansel/533/orig -> origin/gh/jansel/533/orig 2025-12-04T08:57:43.7855128Z * [new branch] gh/jansel/552/base -> origin/gh/jansel/552/base 2025-12-04T08:57:43.7856420Z * [new branch] gh/jansel/552/head -> origin/gh/jansel/552/head 2025-12-04T08:57:43.7857830Z * [new branch] gh/jansel/552/orig -> origin/gh/jansel/552/orig 2025-12-04T08:57:43.7859328Z * [new branch] gh/jansel/553/base -> origin/gh/jansel/553/base 2025-12-04T08:57:43.7860451Z * [new branch] gh/jansel/553/head -> origin/gh/jansel/553/head 2025-12-04T08:57:43.7861669Z * [new branch] gh/jansel/553/orig -> origin/gh/jansel/553/orig 2025-12-04T08:57:43.7863143Z * [new branch] gh/jansel/554/base -> origin/gh/jansel/554/base 2025-12-04T08:57:43.7864222Z * [new branch] gh/jansel/554/head -> origin/gh/jansel/554/head 2025-12-04T08:57:43.7865351Z * [new branch] gh/jansel/554/orig -> origin/gh/jansel/554/orig 2025-12-04T08:57:43.7866864Z * [new branch] gh/jansel/555/base -> origin/gh/jansel/555/base 2025-12-04T08:57:43.7867958Z * [new branch] gh/jansel/555/head -> origin/gh/jansel/555/head 2025-12-04T08:57:43.7869233Z * [new branch] gh/jansel/555/orig -> origin/gh/jansel/555/orig 2025-12-04T08:57:43.7870629Z * [new branch] gh/jansel/556/base -> origin/gh/jansel/556/base 2025-12-04T08:57:43.7871835Z * [new branch] gh/jansel/556/head -> origin/gh/jansel/556/head 2025-12-04T08:57:43.7872883Z * [new branch] gh/jansel/556/orig -> origin/gh/jansel/556/orig 2025-12-04T08:57:43.7874341Z * [new branch] gh/jansel/557/base -> origin/gh/jansel/557/base 2025-12-04T08:57:43.7875397Z * [new branch] gh/jansel/557/head -> origin/gh/jansel/557/head 2025-12-04T08:57:43.7876498Z * [new branch] gh/jansel/557/orig -> origin/gh/jansel/557/orig 2025-12-04T08:57:43.7878075Z * [new branch] gh/jansel/558/base -> origin/gh/jansel/558/base 2025-12-04T08:57:43.7879030Z * [new branch] gh/jansel/558/head -> origin/gh/jansel/558/head 2025-12-04T08:57:43.7880127Z * [new branch] gh/jansel/558/orig -> origin/gh/jansel/558/orig 2025-12-04T08:57:43.7881578Z * [new branch] gh/jansel/559/base -> origin/gh/jansel/559/base 2025-12-04T08:57:43.7882699Z * [new branch] gh/jansel/559/head -> origin/gh/jansel/559/head 2025-12-04T08:57:43.7883756Z * [new branch] gh/jansel/559/orig -> origin/gh/jansel/559/orig 2025-12-04T08:57:43.7885403Z * [new branch] gh/jansel/560/base -> origin/gh/jansel/560/base 2025-12-04T08:57:43.7886579Z * [new branch] gh/jansel/560/head -> origin/gh/jansel/560/head 2025-12-04T08:57:43.7887657Z * [new branch] gh/jansel/560/orig -> origin/gh/jansel/560/orig 2025-12-04T08:57:43.7889101Z * [new branch] gh/jansel/561/base -> origin/gh/jansel/561/base 2025-12-04T08:57:43.7890194Z * [new branch] gh/jansel/561/head -> origin/gh/jansel/561/head 2025-12-04T08:57:43.7891261Z * [new branch] gh/jansel/561/orig -> origin/gh/jansel/561/orig 2025-12-04T08:57:43.7892690Z * [new branch] gh/jansel/562/base -> origin/gh/jansel/562/base 2025-12-04T08:57:43.7893825Z * [new branch] gh/jansel/562/head -> origin/gh/jansel/562/head 2025-12-04T08:57:43.7894942Z * [new branch] gh/jansel/562/orig -> origin/gh/jansel/562/orig 2025-12-04T08:57:43.7896395Z * [new branch] gh/jansel/563/base -> origin/gh/jansel/563/base 2025-12-04T08:57:43.7897820Z * [new branch] gh/jansel/563/head -> origin/gh/jansel/563/head 2025-12-04T08:57:43.7898916Z * [new branch] gh/jansel/563/orig -> origin/gh/jansel/563/orig 2025-12-04T08:57:43.7900911Z * [new branch] gh/jansel/564/base -> origin/gh/jansel/564/base 2025-12-04T08:57:43.7902106Z * [new branch] gh/jansel/564/head -> origin/gh/jansel/564/head 2025-12-04T08:57:43.7903207Z * [new branch] gh/jansel/564/orig -> origin/gh/jansel/564/orig 2025-12-04T08:57:43.7904974Z * [new branch] gh/jansel/565/base -> origin/gh/jansel/565/base 2025-12-04T08:57:43.7905788Z * [new branch] gh/jansel/565/head -> origin/gh/jansel/565/head 2025-12-04T08:57:43.7906971Z * [new branch] gh/jansel/565/orig -> origin/gh/jansel/565/orig 2025-12-04T08:57:43.7908517Z * [new branch] gh/jansel/566/base -> origin/gh/jansel/566/base 2025-12-04T08:57:43.7909781Z * [new branch] gh/jansel/566/head -> origin/gh/jansel/566/head 2025-12-04T08:57:43.7910838Z * [new branch] gh/jansel/566/orig -> origin/gh/jansel/566/orig 2025-12-04T08:57:43.7912306Z * [new branch] gh/jansel/567/base -> origin/gh/jansel/567/base 2025-12-04T08:57:43.7913371Z * [new branch] gh/jansel/567/head -> origin/gh/jansel/567/head 2025-12-04T08:57:43.7914554Z * [new branch] gh/jansel/567/orig -> origin/gh/jansel/567/orig 2025-12-04T08:57:43.7916004Z * [new branch] gh/jansel/568/base -> origin/gh/jansel/568/base 2025-12-04T08:57:43.7917239Z * [new branch] gh/jansel/568/head -> origin/gh/jansel/568/head 2025-12-04T08:57:43.7918390Z * [new branch] gh/jansel/568/orig -> origin/gh/jansel/568/orig 2025-12-04T08:57:43.7919837Z * [new branch] gh/jansel/569/base -> origin/gh/jansel/569/base 2025-12-04T08:57:43.7921011Z * [new branch] gh/jansel/569/head -> origin/gh/jansel/569/head 2025-12-04T08:57:43.7925499Z * [new branch] gh/jansel/569/orig -> origin/gh/jansel/569/orig 2025-12-04T08:57:43.7927075Z * [new branch] gh/jansel/570/base -> origin/gh/jansel/570/base 2025-12-04T08:57:43.7928259Z * [new branch] gh/jansel/570/head -> origin/gh/jansel/570/head 2025-12-04T08:57:43.7929355Z * [new branch] gh/jansel/570/orig -> origin/gh/jansel/570/orig 2025-12-04T08:57:43.7930862Z * [new branch] gh/jansel/571/base -> origin/gh/jansel/571/base 2025-12-04T08:57:43.7931984Z * [new branch] gh/jansel/571/head -> origin/gh/jansel/571/head 2025-12-04T08:57:43.7933124Z * [new branch] gh/jansel/571/orig -> origin/gh/jansel/571/orig 2025-12-04T08:57:43.7934651Z * [new branch] gh/jansel/572/base -> origin/gh/jansel/572/base 2025-12-04T08:57:43.7935834Z * [new branch] gh/jansel/572/head -> origin/gh/jansel/572/head 2025-12-04T08:57:43.7937253Z * [new branch] gh/jansel/572/orig -> origin/gh/jansel/572/orig 2025-12-04T08:57:43.7938886Z * [new branch] gh/jansel/573/base -> origin/gh/jansel/573/base 2025-12-04T08:57:43.7940021Z * [new branch] gh/jansel/573/head -> origin/gh/jansel/573/head 2025-12-04T08:57:43.7941154Z * [new branch] gh/jansel/573/orig -> origin/gh/jansel/573/orig 2025-12-04T08:57:43.7942713Z * [new branch] gh/jansel/574/base -> origin/gh/jansel/574/base 2025-12-04T08:57:43.7943904Z * [new branch] gh/jansel/574/head -> origin/gh/jansel/574/head 2025-12-04T08:57:43.7945013Z * [new branch] gh/jansel/574/orig -> origin/gh/jansel/574/orig 2025-12-04T08:57:43.7946560Z * [new branch] gh/jansel/575/base -> origin/gh/jansel/575/base 2025-12-04T08:57:43.7947684Z * [new branch] gh/jansel/575/head -> origin/gh/jansel/575/head 2025-12-04T08:57:43.7948917Z * [new branch] gh/jansel/575/orig -> origin/gh/jansel/575/orig 2025-12-04T08:57:43.7950379Z * [new branch] gh/jansel/576/base -> origin/gh/jansel/576/base 2025-12-04T08:57:43.7951547Z * [new branch] gh/jansel/576/head -> origin/gh/jansel/576/head 2025-12-04T08:57:43.7952779Z * [new branch] gh/jansel/576/orig -> origin/gh/jansel/576/orig 2025-12-04T08:57:43.7955045Z * [new branch] gh/jbschlosser/247/base -> origin/gh/jbschlosser/247/base 2025-12-04T08:57:43.7956186Z * [new branch] gh/jbschlosser/247/head -> origin/gh/jbschlosser/247/head 2025-12-04T08:57:43.7957258Z * [new branch] gh/jbschlosser/247/orig -> origin/gh/jbschlosser/247/orig 2025-12-04T08:57:43.7958805Z * [new branch] gh/jbschlosser/250/base -> origin/gh/jbschlosser/250/base 2025-12-04T08:57:43.7959800Z * [new branch] gh/jbschlosser/250/head -> origin/gh/jbschlosser/250/head 2025-12-04T08:57:43.7960888Z * [new branch] gh/jbschlosser/250/orig -> origin/gh/jbschlosser/250/orig 2025-12-04T08:57:43.7962748Z * [new branch] gh/jerryzh168/1/base -> origin/gh/jerryzh168/1/base 2025-12-04T08:57:43.7963880Z * [new branch] gh/jerryzh168/1/head -> origin/gh/jerryzh168/1/head 2025-12-04T08:57:43.7964815Z * [new branch] gh/jerryzh168/1/orig -> origin/gh/jerryzh168/1/orig 2025-12-04T08:57:43.7966689Z * [new branch] gh/jiayisunx/59/base -> origin/gh/jiayisunx/59/base 2025-12-04T08:57:43.7967787Z * [new branch] gh/jiayisunx/59/head -> origin/gh/jiayisunx/59/head 2025-12-04T08:57:43.7968876Z * [new branch] gh/jiayisunx/59/orig -> origin/gh/jiayisunx/59/orig 2025-12-04T08:57:43.7970313Z * [new branch] gh/jiayisunx/61/base -> origin/gh/jiayisunx/61/base 2025-12-04T08:57:43.7971404Z * [new branch] gh/jiayisunx/61/head -> origin/gh/jiayisunx/61/head 2025-12-04T08:57:43.7972478Z * [new branch] gh/jiayisunx/61/orig -> origin/gh/jiayisunx/61/orig 2025-12-04T08:57:43.7973926Z * [new branch] gh/jiayisunx/68/base -> origin/gh/jiayisunx/68/base 2025-12-04T08:57:43.7974976Z * [new branch] gh/jiayisunx/68/head -> origin/gh/jiayisunx/68/head 2025-12-04T08:57:43.7976077Z * [new branch] gh/jiayisunx/68/orig -> origin/gh/jiayisunx/68/orig 2025-12-04T08:57:43.7977888Z * [new branch] gh/jiayisunx/77/base -> origin/gh/jiayisunx/77/base 2025-12-04T08:57:43.7979010Z * [new branch] gh/jiayisunx/77/head -> origin/gh/jiayisunx/77/head 2025-12-04T08:57:43.7980240Z * [new branch] gh/jiayisunx/77/orig -> origin/gh/jiayisunx/77/orig 2025-12-04T08:57:43.7981761Z * [new branch] gh/jiayisunx/78/base -> origin/gh/jiayisunx/78/base 2025-12-04T08:57:43.7982820Z * [new branch] gh/jiayisunx/78/head -> origin/gh/jiayisunx/78/head 2025-12-04T08:57:43.7984006Z * [new branch] gh/jiayisunx/78/orig -> origin/gh/jiayisunx/78/orig 2025-12-04T08:57:43.7985437Z * [new branch] gh/jiayisunx/79/base -> origin/gh/jiayisunx/79/base 2025-12-04T08:57:43.7986542Z * [new branch] gh/jiayisunx/79/head -> origin/gh/jiayisunx/79/head 2025-12-04T08:57:43.7987657Z * [new branch] gh/jiayisunx/79/orig -> origin/gh/jiayisunx/79/orig 2025-12-04T08:57:43.7989307Z * [new branch] gh/jiayisunx/82/base -> origin/gh/jiayisunx/82/base 2025-12-04T08:57:43.7990376Z * [new branch] gh/jiayisunx/82/head -> origin/gh/jiayisunx/82/head 2025-12-04T08:57:43.7991510Z * [new branch] gh/jiayisunx/82/orig -> origin/gh/jiayisunx/82/orig 2025-12-04T08:57:43.7992845Z * [new branch] gh/jiayisunx/83/base -> origin/gh/jiayisunx/83/base 2025-12-04T08:57:43.7993995Z * [new branch] gh/jiayisunx/83/head -> origin/gh/jiayisunx/83/head 2025-12-04T08:57:43.7995173Z * [new branch] gh/jiayisunx/83/orig -> origin/gh/jiayisunx/83/orig 2025-12-04T08:57:43.7996539Z * [new branch] gh/jiayisunx/84/base -> origin/gh/jiayisunx/84/base 2025-12-04T08:57:43.7998103Z * [new branch] gh/jiayisunx/84/head -> origin/gh/jiayisunx/84/head 2025-12-04T08:57:43.7999213Z * [new branch] gh/jiayisunx/84/orig -> origin/gh/jiayisunx/84/orig 2025-12-04T08:57:43.8000662Z * [new branch] gh/jiayisunx/85/base -> origin/gh/jiayisunx/85/base 2025-12-04T08:57:43.8001727Z * [new branch] gh/jiayisunx/85/head -> origin/gh/jiayisunx/85/head 2025-12-04T08:57:43.8002800Z * [new branch] gh/jiayisunx/85/orig -> origin/gh/jiayisunx/85/orig 2025-12-04T08:57:43.8004186Z * [new branch] gh/jiayisunx/86/base -> origin/gh/jiayisunx/86/base 2025-12-04T08:57:43.8005260Z * [new branch] gh/jiayisunx/86/head -> origin/gh/jiayisunx/86/head 2025-12-04T08:57:43.8006378Z * [new branch] gh/jiayisunx/86/orig -> origin/gh/jiayisunx/86/orig 2025-12-04T08:57:43.8008066Z * [new branch] gh/jiayisunx/87/base -> origin/gh/jiayisunx/87/base 2025-12-04T08:57:43.8008893Z * [new branch] gh/jiayisunx/87/head -> origin/gh/jiayisunx/87/head 2025-12-04T08:57:43.8010085Z * [new branch] gh/jiayisunx/87/orig -> origin/gh/jiayisunx/87/orig 2025-12-04T08:57:43.8011495Z * [new branch] gh/jiayisunx/88/base -> origin/gh/jiayisunx/88/base 2025-12-04T08:57:43.8012613Z * [new branch] gh/jiayisunx/88/head -> origin/gh/jiayisunx/88/head 2025-12-04T08:57:43.8013708Z * [new branch] gh/jiayisunx/88/orig -> origin/gh/jiayisunx/88/orig 2025-12-04T08:57:43.8015118Z * [new branch] gh/jiayisunx/89/base -> origin/gh/jiayisunx/89/base 2025-12-04T08:57:43.8016176Z * [new branch] gh/jiayisunx/89/head -> origin/gh/jiayisunx/89/head 2025-12-04T08:57:43.8017837Z * [new branch] gh/jiayisunx/89/orig -> origin/gh/jiayisunx/89/orig 2025-12-04T08:57:43.8019326Z * [new branch] gh/jiayisunx/90/base -> origin/gh/jiayisunx/90/base 2025-12-04T08:57:43.8020393Z * [new branch] gh/jiayisunx/90/head -> origin/gh/jiayisunx/90/head 2025-12-04T08:57:43.8021806Z * [new branch] gh/jiayisunx/90/orig -> origin/gh/jiayisunx/90/orig 2025-12-04T08:57:43.8023486Z * [new branch] gh/jjwu@meta.com/1/base -> origin/gh/jjwu@meta.com/1/base 2025-12-04T08:57:43.8024691Z * [new branch] gh/jjwu@meta.com/1/head -> origin/gh/jjwu@meta.com/1/head 2025-12-04T08:57:43.8026399Z * [new branch] gh/jturney/1/base -> origin/gh/jturney/1/base 2025-12-04T08:57:43.8027514Z * [new branch] gh/jturney/1/head -> origin/gh/jturney/1/head 2025-12-04T08:57:43.8028647Z * [new branch] gh/jturney/1/orig -> origin/gh/jturney/1/orig 2025-12-04T08:57:43.8030130Z * [new branch] gh/jturney/2/base -> origin/gh/jturney/2/base 2025-12-04T08:57:43.8031246Z * [new branch] gh/jturney/2/head -> origin/gh/jturney/2/head 2025-12-04T08:57:43.8032331Z * [new branch] gh/jturney/2/orig -> origin/gh/jturney/2/orig 2025-12-04T08:57:43.8034278Z * [new branch] gh/karthickai/10/base -> origin/gh/karthickai/10/base 2025-12-04T08:57:43.8035480Z * [new branch] gh/karthickai/10/head -> origin/gh/karthickai/10/head 2025-12-04T08:57:43.8036610Z * [new branch] gh/karthickai/10/orig -> origin/gh/karthickai/10/orig 2025-12-04T08:57:43.8038137Z * [new branch] gh/karthickai/11/base -> origin/gh/karthickai/11/base 2025-12-04T08:57:43.8039321Z * [new branch] gh/karthickai/11/head -> origin/gh/karthickai/11/head 2025-12-04T08:57:43.8040421Z * [new branch] gh/karthickai/11/orig -> origin/gh/karthickai/11/orig 2025-12-04T08:57:43.8042297Z * [new branch] gh/karthickai/12/base -> origin/gh/karthickai/12/base 2025-12-04T08:57:43.8043482Z * [new branch] gh/karthickai/12/head -> origin/gh/karthickai/12/head 2025-12-04T08:57:43.8044612Z * [new branch] gh/karthickai/12/orig -> origin/gh/karthickai/12/orig 2025-12-04T08:57:43.8046067Z * [new branch] gh/karthickai/13/base -> origin/gh/karthickai/13/base 2025-12-04T08:57:43.8047252Z * [new branch] gh/karthickai/13/head -> origin/gh/karthickai/13/head 2025-12-04T08:57:43.8048317Z * [new branch] gh/karthickai/13/orig -> origin/gh/karthickai/13/orig 2025-12-04T08:57:43.8049997Z * [new branch] gh/karthickai/14/base -> origin/gh/karthickai/14/base 2025-12-04T08:57:43.8051166Z * [new branch] gh/karthickai/14/head -> origin/gh/karthickai/14/head 2025-12-04T08:57:43.8052278Z * [new branch] gh/karthickai/14/orig -> origin/gh/karthickai/14/orig 2025-12-04T08:57:43.8054635Z * [new branch] gh/karthickai/15/base -> origin/gh/karthickai/15/base 2025-12-04T08:57:43.8055685Z * [new branch] gh/karthickai/15/head -> origin/gh/karthickai/15/head 2025-12-04T08:57:43.8057082Z * [new branch] gh/karthickai/15/orig -> origin/gh/karthickai/15/orig 2025-12-04T08:57:43.8058602Z * [new branch] gh/karthickai/16/base -> origin/gh/karthickai/16/base 2025-12-04T08:57:43.8059779Z * [new branch] gh/karthickai/16/head -> origin/gh/karthickai/16/head 2025-12-04T08:57:43.8060913Z * [new branch] gh/karthickai/16/orig -> origin/gh/karthickai/16/orig 2025-12-04T08:57:43.8062347Z * [new branch] gh/karthickai/17/base -> origin/gh/karthickai/17/base 2025-12-04T08:57:43.8063374Z * [new branch] gh/karthickai/17/head -> origin/gh/karthickai/17/head 2025-12-04T08:57:43.8064529Z * [new branch] gh/karthickai/17/orig -> origin/gh/karthickai/17/orig 2025-12-04T08:57:43.8066163Z * [new branch] gh/karthickai/18/base -> origin/gh/karthickai/18/base 2025-12-04T08:57:43.8067507Z * [new branch] gh/karthickai/18/head -> origin/gh/karthickai/18/head 2025-12-04T08:57:43.8068907Z * [new branch] gh/karthickai/18/orig -> origin/gh/karthickai/18/orig 2025-12-04T08:57:43.8071008Z * [new branch] gh/karthickai/19/base -> origin/gh/karthickai/19/base 2025-12-04T08:57:43.8072150Z * [new branch] gh/karthickai/19/head -> origin/gh/karthickai/19/head 2025-12-04T08:57:43.8073353Z * [new branch] gh/karthickai/19/orig -> origin/gh/karthickai/19/orig 2025-12-04T08:57:43.8075570Z * [new branch] gh/karthickai/20/base -> origin/gh/karthickai/20/base 2025-12-04T08:57:43.8077494Z * [new branch] gh/karthickai/20/head -> origin/gh/karthickai/20/head 2025-12-04T08:57:43.8078622Z * [new branch] gh/karthickai/20/orig -> origin/gh/karthickai/20/orig 2025-12-04T08:57:43.8080173Z * [new branch] gh/karthickai/21/base -> origin/gh/karthickai/21/base 2025-12-04T08:57:43.8081480Z * [new branch] gh/karthickai/21/head -> origin/gh/karthickai/21/head 2025-12-04T08:57:43.8082648Z * [new branch] gh/karthickai/21/orig -> origin/gh/karthickai/21/orig 2025-12-04T08:57:43.8084271Z * [new branch] gh/karthickai/22/base -> origin/gh/karthickai/22/base 2025-12-04T08:57:43.8085289Z * [new branch] gh/karthickai/22/head -> origin/gh/karthickai/22/head 2025-12-04T08:57:43.8086373Z * [new branch] gh/karthickai/22/orig -> origin/gh/karthickai/22/orig 2025-12-04T08:57:43.8088231Z * [new branch] gh/karthickai/23/base -> origin/gh/karthickai/23/base 2025-12-04T08:57:43.8089510Z * [new branch] gh/karthickai/23/head -> origin/gh/karthickai/23/head 2025-12-04T08:57:43.8090596Z * [new branch] gh/karthickai/23/orig -> origin/gh/karthickai/23/orig 2025-12-04T08:57:43.8092040Z * [new branch] gh/karthickai/24/base -> origin/gh/karthickai/24/base 2025-12-04T08:57:43.8093147Z * [new branch] gh/karthickai/24/head -> origin/gh/karthickai/24/head 2025-12-04T08:57:43.8094256Z * [new branch] gh/karthickai/24/orig -> origin/gh/karthickai/24/orig 2025-12-04T08:57:43.8096373Z * [new branch] gh/karthickai/25/base -> origin/gh/karthickai/25/base 2025-12-04T08:57:43.8097946Z * [new branch] gh/karthickai/25/head -> origin/gh/karthickai/25/head 2025-12-04T08:57:43.8099097Z * [new branch] gh/karthickai/25/orig -> origin/gh/karthickai/25/orig 2025-12-04T08:57:43.8100523Z * [new branch] gh/karthickai/26/base -> origin/gh/karthickai/26/base 2025-12-04T08:57:43.8101754Z * [new branch] gh/karthickai/26/head -> origin/gh/karthickai/26/head 2025-12-04T08:57:43.8102968Z * [new branch] gh/karthickai/26/orig -> origin/gh/karthickai/26/orig 2025-12-04T08:57:43.8106148Z * [new branch] gh/karthickai/6/base -> origin/gh/karthickai/6/base 2025-12-04T08:57:43.8107922Z * [new branch] gh/karthickai/6/head -> origin/gh/karthickai/6/head 2025-12-04T08:57:43.8109202Z * [new branch] gh/karthickai/6/orig -> origin/gh/karthickai/6/orig 2025-12-04T08:57:43.8110986Z * [new branch] gh/krocki/1/base -> origin/gh/krocki/1/base 2025-12-04T08:57:43.8112060Z * [new branch] gh/krocki/1/head -> origin/gh/krocki/1/head 2025-12-04T08:57:43.8113147Z * [new branch] gh/krocki/1/orig -> origin/gh/krocki/1/orig 2025-12-04T08:57:43.8114624Z * [new branch] gh/krocki/2/base -> origin/gh/krocki/2/base 2025-12-04T08:57:43.8115692Z * [new branch] gh/krocki/2/head -> origin/gh/krocki/2/head 2025-12-04T08:57:43.8116784Z * [new branch] gh/krocki/2/orig -> origin/gh/krocki/2/orig 2025-12-04T08:57:43.8118488Z * [new branch] gh/kurtamohler/60/base -> origin/gh/kurtamohler/60/base 2025-12-04T08:57:43.8119566Z * [new branch] gh/kurtamohler/60/head -> origin/gh/kurtamohler/60/head 2025-12-04T08:57:43.8120878Z * [new branch] gh/kurtamohler/60/orig -> origin/gh/kurtamohler/60/orig 2025-12-04T08:57:43.8122691Z * [new branch] gh/kurtamohler/61/base -> origin/gh/kurtamohler/61/base 2025-12-04T08:57:43.8123820Z * [new branch] gh/kurtamohler/61/head -> origin/gh/kurtamohler/61/head 2025-12-04T08:57:43.8124994Z * [new branch] gh/kurtamohler/61/orig -> origin/gh/kurtamohler/61/orig 2025-12-04T08:57:43.8126501Z * [new branch] gh/kurtamohler/62/base -> origin/gh/kurtamohler/62/base 2025-12-04T08:57:43.8127613Z * [new branch] gh/kurtamohler/62/head -> origin/gh/kurtamohler/62/head 2025-12-04T08:57:43.8128735Z * [new branch] gh/kurtamohler/62/orig -> origin/gh/kurtamohler/62/orig 2025-12-04T08:57:43.8130248Z * [new branch] gh/kurtamohler/63/base -> origin/gh/kurtamohler/63/base 2025-12-04T08:57:43.8131389Z * [new branch] gh/kurtamohler/63/head -> origin/gh/kurtamohler/63/head 2025-12-04T08:57:43.8132545Z * [new branch] gh/kurtamohler/63/orig -> origin/gh/kurtamohler/63/orig 2025-12-04T08:57:43.8134089Z * [new branch] gh/kurtamohler/64/base -> origin/gh/kurtamohler/64/base 2025-12-04T08:57:43.8135205Z * [new branch] gh/kurtamohler/64/head -> origin/gh/kurtamohler/64/head 2025-12-04T08:57:43.8136419Z * [new branch] gh/kurtamohler/64/orig -> origin/gh/kurtamohler/64/orig 2025-12-04T08:57:43.8138250Z * [new branch] gh/kurtamohler/65/base -> origin/gh/kurtamohler/65/base 2025-12-04T08:57:43.8139345Z * [new branch] gh/kurtamohler/65/head -> origin/gh/kurtamohler/65/head 2025-12-04T08:57:43.8140447Z * [new branch] gh/kurtamohler/65/orig -> origin/gh/kurtamohler/65/orig 2025-12-04T08:57:43.8141885Z * [new branch] gh/kurtamohler/66/base -> origin/gh/kurtamohler/66/base 2025-12-04T08:57:43.8142991Z * [new branch] gh/kurtamohler/66/head -> origin/gh/kurtamohler/66/head 2025-12-04T08:57:43.8144091Z * [new branch] gh/kurtamohler/66/orig -> origin/gh/kurtamohler/66/orig 2025-12-04T08:57:43.8145555Z * [new branch] gh/kurtamohler/67/base -> origin/gh/kurtamohler/67/base 2025-12-04T08:57:43.8146650Z * [new branch] gh/kurtamohler/67/head -> origin/gh/kurtamohler/67/head 2025-12-04T08:57:43.8147772Z * [new branch] gh/kurtamohler/67/orig -> origin/gh/kurtamohler/67/orig 2025-12-04T08:57:43.8149766Z * [new branch] gh/kwen2501/130/base -> origin/gh/kwen2501/130/base 2025-12-04T08:57:43.8151249Z * [new branch] gh/kwen2501/130/head -> origin/gh/kwen2501/130/head 2025-12-04T08:57:43.8152326Z * [new branch] gh/kwen2501/130/orig -> origin/gh/kwen2501/130/orig 2025-12-04T08:57:43.8153788Z * [new branch] gh/kwen2501/170/base -> origin/gh/kwen2501/170/base 2025-12-04T08:57:43.8154853Z * [new branch] gh/kwen2501/170/head -> origin/gh/kwen2501/170/head 2025-12-04T08:57:43.8156394Z * [new branch] gh/kwen2501/187/base -> origin/gh/kwen2501/187/base 2025-12-04T08:57:43.8157642Z * [new branch] gh/kwen2501/187/head -> origin/gh/kwen2501/187/head 2025-12-04T08:57:43.8158812Z * [new branch] gh/kwen2501/187/orig -> origin/gh/kwen2501/187/orig 2025-12-04T08:57:43.8160247Z * [new branch] gh/kwen2501/188/base -> origin/gh/kwen2501/188/base 2025-12-04T08:57:43.8161315Z * [new branch] gh/kwen2501/188/head -> origin/gh/kwen2501/188/head 2025-12-04T08:57:43.8162460Z * [new branch] gh/kwen2501/188/orig -> origin/gh/kwen2501/188/orig 2025-12-04T08:57:43.8163886Z * [new branch] gh/kwen2501/211/base -> origin/gh/kwen2501/211/base 2025-12-04T08:57:43.8164966Z * [new branch] gh/kwen2501/211/head -> origin/gh/kwen2501/211/head 2025-12-04T08:57:43.8166482Z * [new branch] gh/kwen2501/224/base -> origin/gh/kwen2501/224/base 2025-12-04T08:57:43.8167992Z * [new branch] gh/kwen2501/224/head -> origin/gh/kwen2501/224/head 2025-12-04T08:57:43.8169099Z * [new branch] gh/kwen2501/224/orig -> origin/gh/kwen2501/224/orig 2025-12-04T08:57:43.8170505Z * [new branch] gh/kwen2501/228/base -> origin/gh/kwen2501/228/base 2025-12-04T08:57:43.8171587Z * [new branch] gh/kwen2501/228/head -> origin/gh/kwen2501/228/head 2025-12-04T08:57:43.8172667Z * [new branch] gh/kwen2501/228/orig -> origin/gh/kwen2501/228/orig 2025-12-04T08:57:43.8174774Z * [new branch] gh/kwen2501/234/base -> origin/gh/kwen2501/234/base 2025-12-04T08:57:43.8175869Z * [new branch] gh/kwen2501/234/head -> origin/gh/kwen2501/234/head 2025-12-04T08:57:43.8177299Z * [new branch] gh/kwen2501/234/orig -> origin/gh/kwen2501/234/orig 2025-12-04T08:57:43.8178773Z * [new branch] gh/kwen2501/235/base -> origin/gh/kwen2501/235/base 2025-12-04T08:57:43.8179869Z * [new branch] gh/kwen2501/235/head -> origin/gh/kwen2501/235/head 2025-12-04T08:57:43.8181001Z * [new branch] gh/kwen2501/235/orig -> origin/gh/kwen2501/235/orig 2025-12-04T08:57:43.8182563Z * [new branch] gh/kwen2501/236/base -> origin/gh/kwen2501/236/base 2025-12-04T08:57:43.8183671Z * [new branch] gh/kwen2501/236/head -> origin/gh/kwen2501/236/head 2025-12-04T08:57:43.8184841Z * [new branch] gh/kwen2501/236/orig -> origin/gh/kwen2501/236/orig 2025-12-04T08:57:43.8186255Z * [new branch] gh/kwen2501/237/base -> origin/gh/kwen2501/237/base 2025-12-04T08:57:43.8187363Z * [new branch] gh/kwen2501/237/head -> origin/gh/kwen2501/237/head 2025-12-04T08:57:43.8188459Z * [new branch] gh/kwen2501/237/orig -> origin/gh/kwen2501/237/orig 2025-12-04T08:57:43.8190005Z * [new branch] gh/kwen2501/238/base -> origin/gh/kwen2501/238/base 2025-12-04T08:57:43.8191068Z * [new branch] gh/kwen2501/238/head -> origin/gh/kwen2501/238/head 2025-12-04T08:57:43.8192225Z * [new branch] gh/kwen2501/238/orig -> origin/gh/kwen2501/238/orig 2025-12-04T08:57:43.8193657Z * [new branch] gh/kwen2501/240/base -> origin/gh/kwen2501/240/base 2025-12-04T08:57:43.8194729Z * [new branch] gh/kwen2501/240/head -> origin/gh/kwen2501/240/head 2025-12-04T08:57:43.8195871Z * [new branch] gh/kwen2501/240/orig -> origin/gh/kwen2501/240/orig 2025-12-04T08:57:43.8197337Z * [new branch] gh/kwen2501/241/base -> origin/gh/kwen2501/241/base 2025-12-04T08:57:43.8198448Z * [new branch] gh/kwen2501/241/head -> origin/gh/kwen2501/241/head 2025-12-04T08:57:43.8199491Z * [new branch] gh/kwen2501/241/orig -> origin/gh/kwen2501/241/orig 2025-12-04T08:57:43.8200988Z * [new branch] gh/kwen2501/247/base -> origin/gh/kwen2501/247/base 2025-12-04T08:57:43.8202070Z * [new branch] gh/kwen2501/247/head -> origin/gh/kwen2501/247/head 2025-12-04T08:57:43.8203198Z * [new branch] gh/kwen2501/247/orig -> origin/gh/kwen2501/247/orig 2025-12-04T08:57:43.8204556Z * [new branch] gh/kwen2501/252/base -> origin/gh/kwen2501/252/base 2025-12-04T08:57:43.8205603Z * [new branch] gh/kwen2501/252/head -> origin/gh/kwen2501/252/head 2025-12-04T08:57:43.8206695Z * [new branch] gh/kwen2501/252/orig -> origin/gh/kwen2501/252/orig 2025-12-04T08:57:43.8208687Z * [new branch] gh/kwen2501/259/base -> origin/gh/kwen2501/259/base 2025-12-04T08:57:43.8209878Z * [new branch] gh/kwen2501/259/head -> origin/gh/kwen2501/259/head 2025-12-04T08:57:43.8210995Z * [new branch] gh/kwen2501/259/orig -> origin/gh/kwen2501/259/orig 2025-12-04T08:57:43.8212609Z * [new branch] gh/kwen2501/260/base -> origin/gh/kwen2501/260/base 2025-12-04T08:57:43.8213806Z * [new branch] gh/kwen2501/260/head -> origin/gh/kwen2501/260/head 2025-12-04T08:57:43.8214905Z * [new branch] gh/kwen2501/260/orig -> origin/gh/kwen2501/260/orig 2025-12-04T08:57:43.8216448Z * [new branch] gh/kwen2501/268/base -> origin/gh/kwen2501/268/base 2025-12-04T08:57:43.8217846Z * [new branch] gh/kwen2501/268/head -> origin/gh/kwen2501/268/head 2025-12-04T08:57:43.8218928Z * [new branch] gh/kwen2501/268/orig -> origin/gh/kwen2501/268/orig 2025-12-04T08:57:43.8220464Z * [new branch] gh/kwen2501/269/base -> origin/gh/kwen2501/269/base 2025-12-04T08:57:43.8222001Z * [new branch] gh/kwen2501/269/head -> origin/gh/kwen2501/269/head 2025-12-04T08:57:43.8223326Z * [new branch] gh/kwen2501/269/orig -> origin/gh/kwen2501/269/orig 2025-12-04T08:57:43.8224891Z * [new branch] gh/kwen2501/270/base -> origin/gh/kwen2501/270/base 2025-12-04T08:57:43.8226129Z * [new branch] gh/kwen2501/270/head -> origin/gh/kwen2501/270/head 2025-12-04T08:57:43.8227277Z * [new branch] gh/kwen2501/270/orig -> origin/gh/kwen2501/270/orig 2025-12-04T08:57:43.8229018Z * [new branch] gh/kwen2501/271/base -> origin/gh/kwen2501/271/base 2025-12-04T08:57:43.8230180Z * [new branch] gh/kwen2501/271/head -> origin/gh/kwen2501/271/head 2025-12-04T08:57:43.8231315Z * [new branch] gh/kwen2501/271/orig -> origin/gh/kwen2501/271/orig 2025-12-04T08:57:43.8233027Z * [new branch] gh/kwen2501/274/base -> origin/gh/kwen2501/274/base 2025-12-04T08:57:43.8234281Z * [new branch] gh/kwen2501/274/head -> origin/gh/kwen2501/274/head 2025-12-04T08:57:43.8235389Z * [new branch] gh/kwen2501/274/orig -> origin/gh/kwen2501/274/orig 2025-12-04T08:57:43.8236987Z * [new branch] gh/kwen2501/275/base -> origin/gh/kwen2501/275/base 2025-12-04T08:57:43.8238351Z * [new branch] gh/kwen2501/275/head -> origin/gh/kwen2501/275/head 2025-12-04T08:57:43.8239476Z * [new branch] gh/kwen2501/275/orig -> origin/gh/kwen2501/275/orig 2025-12-04T08:57:43.8240915Z * [new branch] gh/kwen2501/276/base -> origin/gh/kwen2501/276/base 2025-12-04T08:57:43.8242150Z * [new branch] gh/kwen2501/276/head -> origin/gh/kwen2501/276/head 2025-12-04T08:57:43.8243052Z * [new branch] gh/kwen2501/276/orig -> origin/gh/kwen2501/276/orig 2025-12-04T08:57:43.8244710Z * [new branch] gh/kwen2501/277/base -> origin/gh/kwen2501/277/base 2025-12-04T08:57:43.8245765Z * [new branch] gh/kwen2501/277/head -> origin/gh/kwen2501/277/head 2025-12-04T08:57:43.8246867Z * [new branch] gh/kwen2501/277/orig -> origin/gh/kwen2501/277/orig 2025-12-04T08:57:43.8248413Z * [new branch] gh/kwen2501/278/base -> origin/gh/kwen2501/278/base 2025-12-04T08:57:43.8249479Z * [new branch] gh/kwen2501/278/head -> origin/gh/kwen2501/278/head 2025-12-04T08:57:43.8250564Z * [new branch] gh/kwen2501/278/orig -> origin/gh/kwen2501/278/orig 2025-12-04T08:57:43.8252105Z * [new branch] gh/kwen2501/279/base -> origin/gh/kwen2501/279/base 2025-12-04T08:57:43.8253299Z * [new branch] gh/kwen2501/279/head -> origin/gh/kwen2501/279/head 2025-12-04T08:57:43.8254510Z * [new branch] gh/kwen2501/279/orig -> origin/gh/kwen2501/279/orig 2025-12-04T08:57:43.8255984Z * [new branch] gh/kwen2501/280/base -> origin/gh/kwen2501/280/base 2025-12-04T08:57:43.8257473Z * [new branch] gh/kwen2501/280/head -> origin/gh/kwen2501/280/head 2025-12-04T08:57:43.8258620Z * [new branch] gh/kwen2501/280/orig -> origin/gh/kwen2501/280/orig 2025-12-04T08:57:43.8260269Z * [new branch] gh/kwen2501/281/base -> origin/gh/kwen2501/281/base 2025-12-04T08:57:43.8261357Z * [new branch] gh/kwen2501/281/head -> origin/gh/kwen2501/281/head 2025-12-04T08:57:43.8262526Z * [new branch] gh/kwen2501/281/orig -> origin/gh/kwen2501/281/orig 2025-12-04T08:57:43.8264125Z * [new branch] gh/kwen2501/282/base -> origin/gh/kwen2501/282/base 2025-12-04T08:57:43.8265292Z * [new branch] gh/kwen2501/282/head -> origin/gh/kwen2501/282/head 2025-12-04T08:57:43.8266454Z * [new branch] gh/kwen2501/282/orig -> origin/gh/kwen2501/282/orig 2025-12-04T08:57:43.8267970Z * [new branch] gh/kwen2501/283/base -> origin/gh/kwen2501/283/base 2025-12-04T08:57:43.8269260Z * [new branch] gh/kwen2501/283/head -> origin/gh/kwen2501/283/head 2025-12-04T08:57:43.8270387Z * [new branch] gh/kwen2501/283/orig -> origin/gh/kwen2501/283/orig 2025-12-04T08:57:43.8271891Z * [new branch] gh/kwen2501/284/base -> origin/gh/kwen2501/284/base 2025-12-04T08:57:43.8273079Z * [new branch] gh/kwen2501/284/head -> origin/gh/kwen2501/284/head 2025-12-04T08:57:43.8274220Z * [new branch] gh/kwen2501/284/orig -> origin/gh/kwen2501/284/orig 2025-12-04T08:57:43.8275899Z * [new branch] gh/kwen2501/285/base -> origin/gh/kwen2501/285/base 2025-12-04T08:57:43.8276959Z * [new branch] gh/kwen2501/285/head -> origin/gh/kwen2501/285/head 2025-12-04T08:57:43.8278046Z * [new branch] gh/kwen2501/285/orig -> origin/gh/kwen2501/285/orig 2025-12-04T08:57:43.8279502Z * [new branch] gh/kwen2501/286/base -> origin/gh/kwen2501/286/base 2025-12-04T08:57:43.8280633Z * [new branch] gh/kwen2501/286/head -> origin/gh/kwen2501/286/head 2025-12-04T08:57:43.8281742Z * [new branch] gh/kwen2501/286/orig -> origin/gh/kwen2501/286/orig 2025-12-04T08:57:43.8283064Z * [new branch] gh/kwen2501/287/base -> origin/gh/kwen2501/287/base 2025-12-04T08:57:43.8284183Z * [new branch] gh/kwen2501/287/head -> origin/gh/kwen2501/287/head 2025-12-04T08:57:43.8285270Z * [new branch] gh/kwen2501/287/orig -> origin/gh/kwen2501/287/orig 2025-12-04T08:57:43.8286873Z * [new branch] gh/kwen2501/288/base -> origin/gh/kwen2501/288/base 2025-12-04T08:57:43.8287880Z * [new branch] gh/kwen2501/288/head -> origin/gh/kwen2501/288/head 2025-12-04T08:57:43.8289643Z * [new branch] gh/kwen2501/288/orig -> origin/gh/kwen2501/288/orig 2025-12-04T08:57:43.8291982Z * [new branch] gh/laithsakka/251/base -> origin/gh/laithsakka/251/base 2025-12-04T08:57:43.8293067Z * [new branch] gh/laithsakka/251/head -> origin/gh/laithsakka/251/head 2025-12-04T08:57:43.8294159Z * [new branch] gh/laithsakka/251/orig -> origin/gh/laithsakka/251/orig 2025-12-04T08:57:43.8295588Z * [new branch] gh/laithsakka/276/base -> origin/gh/laithsakka/276/base 2025-12-04T08:57:43.8296850Z * [new branch] gh/laithsakka/276/head -> origin/gh/laithsakka/276/head 2025-12-04T08:57:43.8298092Z * [new branch] gh/laithsakka/276/orig -> origin/gh/laithsakka/276/orig 2025-12-04T08:57:43.8299753Z * [new branch] gh/laithsakka/28/base -> origin/gh/laithsakka/28/base 2025-12-04T08:57:43.8301147Z * [new branch] gh/laithsakka/29/base -> origin/gh/laithsakka/29/base 2025-12-04T08:57:43.8302908Z * [new branch] gh/laithsakka/30/base -> origin/gh/laithsakka/30/base 2025-12-04T08:57:43.8304055Z * [new branch] gh/laithsakka/30/head -> origin/gh/laithsakka/30/head 2025-12-04T08:57:43.8305474Z * [new branch] gh/laithsakka/31/base -> origin/gh/laithsakka/31/base 2025-12-04T08:57:43.8306999Z * [new branch] gh/laithsakka/31/head -> origin/gh/laithsakka/31/head 2025-12-04T08:57:43.8308753Z * [new branch] gh/laithsakka/313/base -> origin/gh/laithsakka/313/base 2025-12-04T08:57:43.8309810Z * [new branch] gh/laithsakka/313/head -> origin/gh/laithsakka/313/head 2025-12-04T08:57:43.8310953Z * [new branch] gh/laithsakka/313/orig -> origin/gh/laithsakka/313/orig 2025-12-04T08:57:43.8312688Z * [new branch] gh/laithsakka/316/base -> origin/gh/laithsakka/316/base 2025-12-04T08:57:43.8313677Z * [new branch] gh/laithsakka/316/head -> origin/gh/laithsakka/316/head 2025-12-04T08:57:43.8314784Z * [new branch] gh/laithsakka/316/orig -> origin/gh/laithsakka/316/orig 2025-12-04T08:57:43.8316271Z * [new branch] gh/laithsakka/317/base -> origin/gh/laithsakka/317/base 2025-12-04T08:57:43.8317291Z * [new branch] gh/laithsakka/317/head -> origin/gh/laithsakka/317/head 2025-12-04T08:57:43.8318301Z * [new branch] gh/laithsakka/317/orig -> origin/gh/laithsakka/317/orig 2025-12-04T08:57:43.8319905Z * [new branch] gh/laithsakka/319/base -> origin/gh/laithsakka/319/base 2025-12-04T08:57:43.8321334Z * [new branch] gh/laithsakka/319/head -> origin/gh/laithsakka/319/head 2025-12-04T08:57:43.8325078Z * [new branch] gh/laithsakka/319/orig -> origin/gh/laithsakka/319/orig 2025-12-04T08:57:43.8326409Z * [new branch] gh/laithsakka/32/base -> origin/gh/laithsakka/32/base 2025-12-04T08:57:43.8327460Z * [new branch] gh/laithsakka/32/head -> origin/gh/laithsakka/32/head 2025-12-04T08:57:43.8329114Z * [new branch] gh/laithsakka/320/base -> origin/gh/laithsakka/320/base 2025-12-04T08:57:43.8330164Z * [new branch] gh/laithsakka/320/head -> origin/gh/laithsakka/320/head 2025-12-04T08:57:43.8331744Z * [new branch] gh/laithsakka/320/orig -> origin/gh/laithsakka/320/orig 2025-12-04T08:57:43.8333226Z * [new branch] gh/laithsakka/321/base -> origin/gh/laithsakka/321/base 2025-12-04T08:57:43.8334463Z * [new branch] gh/laithsakka/321/head -> origin/gh/laithsakka/321/head 2025-12-04T08:57:43.8335635Z * [new branch] gh/laithsakka/321/orig -> origin/gh/laithsakka/321/orig 2025-12-04T08:57:43.8337573Z * [new branch] gh/laithsakka/322/base -> origin/gh/laithsakka/322/base 2025-12-04T08:57:43.8339299Z * [new branch] gh/laithsakka/322/head -> origin/gh/laithsakka/322/head 2025-12-04T08:57:43.8340473Z * [new branch] gh/laithsakka/322/orig -> origin/gh/laithsakka/322/orig 2025-12-04T08:57:43.8342055Z * [new branch] gh/laithsakka/323/base -> origin/gh/laithsakka/323/base 2025-12-04T08:57:43.8343307Z * [new branch] gh/laithsakka/323/head -> origin/gh/laithsakka/323/head 2025-12-04T08:57:43.8344467Z * [new branch] gh/laithsakka/323/orig -> origin/gh/laithsakka/323/orig 2025-12-04T08:57:43.8346117Z * [new branch] gh/laithsakka/324/base -> origin/gh/laithsakka/324/base 2025-12-04T08:57:43.8347155Z * [new branch] gh/laithsakka/324/head -> origin/gh/laithsakka/324/head 2025-12-04T08:57:43.8348244Z * [new branch] gh/laithsakka/324/orig -> origin/gh/laithsakka/324/orig 2025-12-04T08:57:43.8349881Z * [new branch] gh/laithsakka/325/base -> origin/gh/laithsakka/325/base 2025-12-04T08:57:43.8350980Z * [new branch] gh/laithsakka/325/head -> origin/gh/laithsakka/325/head 2025-12-04T08:57:43.8352047Z * [new branch] gh/laithsakka/325/orig -> origin/gh/laithsakka/325/orig 2025-12-04T08:57:43.8353804Z * [new branch] gh/laithsakka/326/base -> origin/gh/laithsakka/326/base 2025-12-04T08:57:43.8355158Z * [new branch] gh/laithsakka/326/head -> origin/gh/laithsakka/326/head 2025-12-04T08:57:43.8356287Z * [new branch] gh/laithsakka/326/orig -> origin/gh/laithsakka/326/orig 2025-12-04T08:57:43.8357806Z * [new branch] gh/laithsakka/327/base -> origin/gh/laithsakka/327/base 2025-12-04T08:57:43.8358971Z * [new branch] gh/laithsakka/327/head -> origin/gh/laithsakka/327/head 2025-12-04T08:57:43.8360265Z * [new branch] gh/laithsakka/327/orig -> origin/gh/laithsakka/327/orig 2025-12-04T08:57:43.8361807Z * [new branch] gh/laithsakka/328/base -> origin/gh/laithsakka/328/base 2025-12-04T08:57:43.8362874Z * [new branch] gh/laithsakka/328/head -> origin/gh/laithsakka/328/head 2025-12-04T08:57:43.8363928Z * [new branch] gh/laithsakka/328/orig -> origin/gh/laithsakka/328/orig 2025-12-04T08:57:43.8366095Z * [new branch] gh/liangel/4/base -> origin/gh/liangel/4/base 2025-12-04T08:57:43.8367214Z * [new branch] gh/liangel/4/head -> origin/gh/liangel/4/head 2025-12-04T08:57:43.8368289Z * [new branch] gh/liangel/4/orig -> origin/gh/liangel/4/orig 2025-12-04T08:57:43.8372078Z * [new branch] gh/lucaskabela/1/base -> origin/gh/lucaskabela/1/base 2025-12-04T08:57:43.8373166Z * [new branch] gh/lucaskabela/1/head -> origin/gh/lucaskabela/1/head 2025-12-04T08:57:43.8374846Z * [new branch] gh/lw/4/base -> origin/gh/lw/4/base 2025-12-04T08:57:43.8375901Z * [new branch] gh/lw/4/head -> origin/gh/lw/4/head 2025-12-04T08:57:43.8377343Z * [new branch] gh/lw/4/orig -> origin/gh/lw/4/orig 2025-12-04T08:57:43.8378866Z * [new branch] gh/lw/5/base -> origin/gh/lw/5/base 2025-12-04T08:57:43.8380013Z * [new branch] gh/lw/5/head -> origin/gh/lw/5/head 2025-12-04T08:57:43.8381113Z * [new branch] gh/lw/5/orig -> origin/gh/lw/5/orig 2025-12-04T08:57:43.8382567Z * [new branch] gh/lw/6/base -> origin/gh/lw/6/base 2025-12-04T08:57:43.8383668Z * [new branch] gh/lw/6/head -> origin/gh/lw/6/head 2025-12-04T08:57:43.8384831Z * [new branch] gh/lw/6/orig -> origin/gh/lw/6/orig 2025-12-04T08:57:43.8386739Z * [new branch] gh/malfet/14/base -> origin/gh/malfet/14/base 2025-12-04T08:57:43.8388189Z * [new branch] gh/malfet/417/base -> origin/gh/malfet/417/base 2025-12-04T08:57:43.8389364Z * [new branch] gh/malfet/417/head -> origin/gh/malfet/417/head 2025-12-04T08:57:43.8390485Z * [new branch] gh/malfet/417/orig -> origin/gh/malfet/417/orig 2025-12-04T08:57:43.8391882Z * [new branch] gh/malfet/506/base -> origin/gh/malfet/506/base 2025-12-04T08:57:43.8393030Z * [new branch] gh/malfet/506/head -> origin/gh/malfet/506/head 2025-12-04T08:57:43.8394111Z * [new branch] gh/malfet/506/orig -> origin/gh/malfet/506/orig 2025-12-04T08:57:43.8395664Z * [new branch] gh/malfet/517/base -> origin/gh/malfet/517/base 2025-12-04T08:57:43.8396764Z * [new branch] gh/malfet/517/head -> origin/gh/malfet/517/head 2025-12-04T08:57:43.8398199Z * [new branch] gh/malfet/528/base -> origin/gh/malfet/528/base 2025-12-04T08:57:43.8399398Z * [new branch] gh/malfet/528/head -> origin/gh/malfet/528/head 2025-12-04T08:57:43.8400467Z * [new branch] gh/malfet/528/orig -> origin/gh/malfet/528/orig 2025-12-04T08:57:43.8401952Z * [new branch] gh/malfet/537/base -> origin/gh/malfet/537/base 2025-12-04T08:57:43.8403015Z * [new branch] gh/malfet/537/head -> origin/gh/malfet/537/head 2025-12-04T08:57:43.8404089Z * [new branch] gh/malfet/537/orig -> origin/gh/malfet/537/orig 2025-12-04T08:57:43.8405517Z * [new branch] gh/malfet/546/base -> origin/gh/malfet/546/base 2025-12-04T08:57:43.8406579Z * [new branch] gh/malfet/546/head -> origin/gh/malfet/546/head 2025-12-04T08:57:43.8407661Z * [new branch] gh/malfet/546/orig -> origin/gh/malfet/546/orig 2025-12-04T08:57:43.8409146Z * [new branch] gh/malfet/565/base -> origin/gh/malfet/565/base 2025-12-04T08:57:43.8410109Z * [new branch] gh/malfet/565/head -> origin/gh/malfet/565/head 2025-12-04T08:57:43.8411218Z * [new branch] gh/malfet/565/orig -> origin/gh/malfet/565/orig 2025-12-04T08:57:43.8412634Z * [new branch] gh/malfet/575/base -> origin/gh/malfet/575/base 2025-12-04T08:57:43.8413835Z * [new branch] gh/malfet/575/head -> origin/gh/malfet/575/head 2025-12-04T08:57:43.8414922Z * [new branch] gh/malfet/575/orig -> origin/gh/malfet/575/orig 2025-12-04T08:57:43.8416444Z * [new branch] gh/malfet/580/base -> origin/gh/malfet/580/base 2025-12-04T08:57:43.8417848Z * [new branch] gh/malfet/580/head -> origin/gh/malfet/580/head 2025-12-04T08:57:43.8418965Z * [new branch] gh/malfet/580/orig -> origin/gh/malfet/580/orig 2025-12-04T08:57:43.8420462Z * [new branch] gh/malfet/581/base -> origin/gh/malfet/581/base 2025-12-04T08:57:43.8421824Z * [new branch] gh/malfet/581/head -> origin/gh/malfet/581/head 2025-12-04T08:57:43.8422947Z * [new branch] gh/malfet/581/orig -> origin/gh/malfet/581/orig 2025-12-04T08:57:43.8424352Z * [new branch] gh/malfet/583/base -> origin/gh/malfet/583/base 2025-12-04T08:57:43.8425499Z * [new branch] gh/malfet/583/head -> origin/gh/malfet/583/head 2025-12-04T08:57:43.8426608Z * [new branch] gh/malfet/583/orig -> origin/gh/malfet/583/orig 2025-12-04T08:57:43.8428225Z * [new branch] gh/malfet/586/base -> origin/gh/malfet/586/base 2025-12-04T08:57:43.8429512Z * [new branch] gh/malfet/586/head -> origin/gh/malfet/586/head 2025-12-04T08:57:43.8430738Z * [new branch] gh/malfet/586/orig -> origin/gh/malfet/586/orig 2025-12-04T08:57:43.8432082Z * [new branch] gh/malfet/587/base -> origin/gh/malfet/587/base 2025-12-04T08:57:43.8433252Z * [new branch] gh/malfet/587/head -> origin/gh/malfet/587/head 2025-12-04T08:57:43.8434321Z * [new branch] gh/malfet/587/orig -> origin/gh/malfet/587/orig 2025-12-04T08:57:43.8435725Z * [new branch] gh/malfet/588/base -> origin/gh/malfet/588/base 2025-12-04T08:57:43.8436791Z * [new branch] gh/malfet/588/head -> origin/gh/malfet/588/head 2025-12-04T08:57:43.8438040Z * [new branch] gh/malfet/588/orig -> origin/gh/malfet/588/orig 2025-12-04T08:57:43.8439515Z * [new branch] gh/malfet/589/base -> origin/gh/malfet/589/base 2025-12-04T08:57:43.8440527Z * [new branch] gh/malfet/589/head -> origin/gh/malfet/589/head 2025-12-04T08:57:43.8441607Z * [new branch] gh/malfet/589/orig -> origin/gh/malfet/589/orig 2025-12-04T08:57:43.8443015Z * [new branch] gh/malfet/590/base -> origin/gh/malfet/590/base 2025-12-04T08:57:43.8444196Z * [new branch] gh/malfet/590/head -> origin/gh/malfet/590/head 2025-12-04T08:57:43.8445266Z * [new branch] gh/malfet/590/orig -> origin/gh/malfet/590/orig 2025-12-04T08:57:43.8447548Z * [new branch] gh/malfet/591/base -> origin/gh/malfet/591/base 2025-12-04T08:57:43.8448645Z * [new branch] gh/malfet/591/head -> origin/gh/malfet/591/head 2025-12-04T08:57:43.8449836Z * [new branch] gh/malfet/591/orig -> origin/gh/malfet/591/orig 2025-12-04T08:57:43.8451285Z * [new branch] gh/malfet/592/base -> origin/gh/malfet/592/base 2025-12-04T08:57:43.8452411Z * [new branch] gh/malfet/592/head -> origin/gh/malfet/592/head 2025-12-04T08:57:43.8453534Z * [new branch] gh/malfet/592/orig -> origin/gh/malfet/592/orig 2025-12-04T08:57:43.8454973Z * [new branch] gh/malfet/593/base -> origin/gh/malfet/593/base 2025-12-04T08:57:43.8456037Z * [new branch] gh/malfet/593/head -> origin/gh/malfet/593/head 2025-12-04T08:57:43.8457477Z * [new branch] gh/malfet/593/orig -> origin/gh/malfet/593/orig 2025-12-04T08:57:43.8459018Z * [new branch] gh/malfet/594/base -> origin/gh/malfet/594/base 2025-12-04T08:57:43.8460222Z * [new branch] gh/malfet/594/head -> origin/gh/malfet/594/head 2025-12-04T08:57:43.8461344Z * [new branch] gh/malfet/594/orig -> origin/gh/malfet/594/orig 2025-12-04T08:57:43.8462798Z * [new branch] gh/malfet/595/base -> origin/gh/malfet/595/base 2025-12-04T08:57:43.8463867Z * [new branch] gh/malfet/595/head -> origin/gh/malfet/595/head 2025-12-04T08:57:43.8465110Z * [new branch] gh/malfet/595/orig -> origin/gh/malfet/595/orig 2025-12-04T08:57:43.8466583Z * [new branch] gh/malfet/596/base -> origin/gh/malfet/596/base 2025-12-04T08:57:43.8467756Z * [new branch] gh/malfet/596/head -> origin/gh/malfet/596/head 2025-12-04T08:57:43.8468868Z * [new branch] gh/malfet/596/orig -> origin/gh/malfet/596/orig 2025-12-04T08:57:43.8470458Z * [new branch] gh/malfet/597/base -> origin/gh/malfet/597/base 2025-12-04T08:57:43.8471515Z * [new branch] gh/malfet/597/head -> origin/gh/malfet/597/head 2025-12-04T08:57:43.8472576Z * [new branch] gh/malfet/597/orig -> origin/gh/malfet/597/orig 2025-12-04T08:57:43.8474034Z * [new branch] gh/malfet/598/base -> origin/gh/malfet/598/base 2025-12-04T08:57:43.8475233Z * [new branch] gh/malfet/598/head -> origin/gh/malfet/598/head 2025-12-04T08:57:43.8476430Z * [new branch] gh/malfet/598/orig -> origin/gh/malfet/598/orig 2025-12-04T08:57:43.8477792Z * [new branch] gh/malfet/599/base -> origin/gh/malfet/599/base 2025-12-04T08:57:43.8478857Z * [new branch] gh/malfet/599/head -> origin/gh/malfet/599/head 2025-12-04T08:57:43.8479964Z * [new branch] gh/malfet/599/orig -> origin/gh/malfet/599/orig 2025-12-04T08:57:43.8481398Z * [new branch] gh/malfet/600/base -> origin/gh/malfet/600/base 2025-12-04T08:57:43.8482460Z * [new branch] gh/malfet/600/head -> origin/gh/malfet/600/head 2025-12-04T08:57:43.8501151Z * [new branch] gh/malfet/600/orig -> origin/gh/malfet/600/orig 2025-12-04T08:57:43.8501640Z * [new branch] gh/malfet/601/base -> origin/gh/malfet/601/base 2025-12-04T08:57:43.8501899Z * [new branch] gh/malfet/601/head -> origin/gh/malfet/601/head 2025-12-04T08:57:43.8502152Z * [new branch] gh/malfet/601/orig -> origin/gh/malfet/601/orig 2025-12-04T08:57:43.8502393Z * [new branch] gh/malfet/602/base -> origin/gh/malfet/602/base 2025-12-04T08:57:43.8502642Z * [new branch] gh/malfet/602/head -> origin/gh/malfet/602/head 2025-12-04T08:57:43.8502877Z * [new branch] gh/malfet/602/orig -> origin/gh/malfet/602/orig 2025-12-04T08:57:43.8503125Z * [new branch] gh/malfet/603/base -> origin/gh/malfet/603/base 2025-12-04T08:57:43.8503358Z * [new branch] gh/malfet/603/head -> origin/gh/malfet/603/head 2025-12-04T08:57:43.8503593Z * [new branch] gh/malfet/603/orig -> origin/gh/malfet/603/orig 2025-12-04T08:57:43.8503845Z * [new branch] gh/malfet/604/base -> origin/gh/malfet/604/base 2025-12-04T08:57:43.8504084Z * [new branch] gh/malfet/604/head -> origin/gh/malfet/604/head 2025-12-04T08:57:43.8504339Z * [new branch] gh/malfet/604/orig -> origin/gh/malfet/604/orig 2025-12-04T08:57:43.8504579Z * [new branch] gh/malfet/605/base -> origin/gh/malfet/605/base 2025-12-04T08:57:43.8504814Z * [new branch] gh/malfet/605/head -> origin/gh/malfet/605/head 2025-12-04T08:57:43.8505067Z * [new branch] gh/malfet/605/orig -> origin/gh/malfet/605/orig 2025-12-04T08:57:43.8505302Z * [new branch] gh/malfet/606/base -> origin/gh/malfet/606/base 2025-12-04T08:57:43.8505781Z * [new branch] gh/malfet/606/head -> origin/gh/malfet/606/head 2025-12-04T08:57:43.8507001Z * [new branch] gh/malfet/606/orig -> origin/gh/malfet/606/orig 2025-12-04T08:57:43.8508484Z * [new branch] gh/malfet/607/base -> origin/gh/malfet/607/base 2025-12-04T08:57:43.8509715Z * [new branch] gh/malfet/607/head -> origin/gh/malfet/607/head 2025-12-04T08:57:43.8510879Z * [new branch] gh/malfet/607/orig -> origin/gh/malfet/607/orig 2025-12-04T08:57:43.8512344Z * [new branch] gh/malfet/608/base -> origin/gh/malfet/608/base 2025-12-04T08:57:43.8513422Z * [new branch] gh/malfet/608/head -> origin/gh/malfet/608/head 2025-12-04T08:57:43.8514546Z * [new branch] gh/malfet/608/orig -> origin/gh/malfet/608/orig 2025-12-04T08:57:43.8516050Z * [new branch] gh/malfet/609/base -> origin/gh/malfet/609/base 2025-12-04T08:57:43.8517113Z * [new branch] gh/malfet/609/head -> origin/gh/malfet/609/head 2025-12-04T08:57:43.8518224Z * [new branch] gh/malfet/609/orig -> origin/gh/malfet/609/orig 2025-12-04T08:57:43.8519708Z * [new branch] gh/malfet/610/base -> origin/gh/malfet/610/base 2025-12-04T08:57:43.8522013Z * [new branch] gh/malfet/610/head -> origin/gh/malfet/610/head 2025-12-04T08:57:43.8523033Z * [new branch] gh/malfet/610/orig -> origin/gh/malfet/610/orig 2025-12-04T08:57:43.8524581Z * [new branch] gh/malfet/611/base -> origin/gh/malfet/611/base 2025-12-04T08:57:43.8525721Z * [new branch] gh/malfet/611/head -> origin/gh/malfet/611/head 2025-12-04T08:57:43.8526846Z * [new branch] gh/malfet/611/orig -> origin/gh/malfet/611/orig 2025-12-04T08:57:43.8528229Z * [new branch] gh/malfet/612/base -> origin/gh/malfet/612/base 2025-12-04T08:57:43.8529760Z * [new branch] gh/malfet/612/head -> origin/gh/malfet/612/head 2025-12-04T08:57:43.8530979Z * [new branch] gh/malfet/612/orig -> origin/gh/malfet/612/orig 2025-12-04T08:57:43.8532513Z * [new branch] gh/malfet/64/base -> origin/gh/malfet/64/base 2025-12-04T08:57:43.8533749Z * [new branch] gh/malfet/64/head -> origin/gh/malfet/64/head 2025-12-04T08:57:43.8535511Z * [new branch] gh/manuelcandales/11/base -> origin/gh/manuelcandales/11/base 2025-12-04T08:57:43.8537025Z * [new branch] gh/manuelcandales/11/head -> origin/gh/manuelcandales/11/head 2025-12-04T08:57:43.8538247Z * [new branch] gh/manuelcandales/11/orig -> origin/gh/manuelcandales/11/orig 2025-12-04T08:57:43.8541153Z * [new branch] gh/markkm/1/base -> origin/gh/markkm/1/base 2025-12-04T08:57:43.8543146Z * [new branch] gh/masnesral/1/base -> origin/gh/masnesral/1/base 2025-12-04T08:57:43.8544260Z * [new branch] gh/masnesral/1/head -> origin/gh/masnesral/1/head 2025-12-04T08:57:43.8545366Z * [new branch] gh/masnesral/1/orig -> origin/gh/masnesral/1/orig 2025-12-04T08:57:43.8547602Z * [new branch] gh/mhorowitz/0/base -> origin/gh/mhorowitz/0/base 2025-12-04T08:57:43.8548850Z * [new branch] gh/mhorowitz/0/head -> origin/gh/mhorowitz/0/head 2025-12-04T08:57:43.8550160Z * [new branch] gh/mhorowitz/1/base -> origin/gh/mhorowitz/1/base 2025-12-04T08:57:43.8551339Z * [new branch] gh/mhorowitz/1/head -> origin/gh/mhorowitz/1/head 2025-12-04T08:57:43.8552657Z * [new branch] gh/mhorowitz/2/base -> origin/gh/mhorowitz/2/base 2025-12-04T08:57:43.8553759Z * [new branch] gh/mhorowitz/2/head -> origin/gh/mhorowitz/2/head 2025-12-04T08:57:43.8555074Z * [new branch] gh/mhorowitz/3/base -> origin/gh/mhorowitz/3/base 2025-12-04T08:57:43.8556111Z * [new branch] gh/mhorowitz/3/head -> origin/gh/mhorowitz/3/head 2025-12-04T08:57:43.8557410Z * [new branch] gh/mhorowitz/4/base -> origin/gh/mhorowitz/4/base 2025-12-04T08:57:43.8558489Z * [new branch] gh/mhorowitz/4/head -> origin/gh/mhorowitz/4/head 2025-12-04T08:57:43.8559764Z * [new branch] gh/mhorowitz/5/base -> origin/gh/mhorowitz/5/base 2025-12-04T08:57:43.8560773Z * [new branch] gh/mhorowitz/5/head -> origin/gh/mhorowitz/5/head 2025-12-04T08:57:43.8562261Z * [new branch] gh/mhorowitz/6/base -> origin/gh/mhorowitz/6/base 2025-12-04T08:57:43.8563286Z * [new branch] gh/mhorowitz/6/head -> origin/gh/mhorowitz/6/head 2025-12-04T08:57:43.8565234Z * [new branch] gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base 2025-12-04T08:57:43.8566351Z * [new branch] gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head 2025-12-04T08:57:43.8568230Z * [new branch] gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base 2025-12-04T08:57:43.8569251Z * [new branch] gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head 2025-12-04T08:57:43.8570731Z * [new branch] gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base 2025-12-04T08:57:43.8571666Z * [new branch] gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head 2025-12-04T08:57:43.8573522Z * [new branch] gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base 2025-12-04T08:57:43.8574601Z * [new branch] gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head 2025-12-04T08:57:43.8575997Z * [new branch] gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base 2025-12-04T08:57:43.8577522Z * [new branch] gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head 2025-12-04T08:57:43.8579081Z * [new branch] gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base 2025-12-04T08:57:43.8580272Z * [new branch] gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head 2025-12-04T08:57:43.8581376Z * [new branch] gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig 2025-12-04T08:57:43.8583102Z * [new branch] gh/mikaylagawarecki/341/base -> origin/gh/mikaylagawarecki/341/base 2025-12-04T08:57:43.8584193Z * [new branch] gh/mikaylagawarecki/341/head -> origin/gh/mikaylagawarecki/341/head 2025-12-04T08:57:43.8585372Z * [new branch] gh/mikaylagawarecki/341/orig -> origin/gh/mikaylagawarecki/341/orig 2025-12-04T08:57:43.8587509Z * [new branch] gh/mikaylagawarecki/342/base -> origin/gh/mikaylagawarecki/342/base 2025-12-04T08:57:43.8588608Z * [new branch] gh/mikaylagawarecki/342/head -> origin/gh/mikaylagawarecki/342/head 2025-12-04T08:57:43.8589843Z * [new branch] gh/mikaylagawarecki/342/orig -> origin/gh/mikaylagawarecki/342/orig 2025-12-04T08:57:43.8591374Z * [new branch] gh/mikaylagawarecki/345/base -> origin/gh/mikaylagawarecki/345/base 2025-12-04T08:57:43.8592430Z * [new branch] gh/mikaylagawarecki/345/head -> origin/gh/mikaylagawarecki/345/head 2025-12-04T08:57:43.8593515Z * [new branch] gh/mikaylagawarecki/345/orig -> origin/gh/mikaylagawarecki/345/orig 2025-12-04T08:57:43.8595207Z * [new branch] gh/mikaylagawarecki/346/base -> origin/gh/mikaylagawarecki/346/base 2025-12-04T08:57:43.8596286Z * [new branch] gh/mikaylagawarecki/346/head -> origin/gh/mikaylagawarecki/346/head 2025-12-04T08:57:43.8597388Z * [new branch] gh/mikaylagawarecki/346/orig -> origin/gh/mikaylagawarecki/346/orig 2025-12-04T08:57:43.8598926Z * [new branch] gh/mikaylagawarecki/347/base -> origin/gh/mikaylagawarecki/347/base 2025-12-04T08:57:43.8599927Z * [new branch] gh/mikaylagawarecki/347/head -> origin/gh/mikaylagawarecki/347/head 2025-12-04T08:57:43.8601017Z * [new branch] gh/mikaylagawarecki/347/orig -> origin/gh/mikaylagawarecki/347/orig 2025-12-04T08:57:43.8602573Z * [new branch] gh/mikaylagawarecki/350/base -> origin/gh/mikaylagawarecki/350/base 2025-12-04T08:57:43.8603670Z * [new branch] gh/mikaylagawarecki/350/head -> origin/gh/mikaylagawarecki/350/head 2025-12-04T08:57:43.8604789Z * [new branch] gh/mikaylagawarecki/350/orig -> origin/gh/mikaylagawarecki/350/orig 2025-12-04T08:57:43.8606678Z * [new branch] gh/mikaylagawarecki/351/base -> origin/gh/mikaylagawarecki/351/base 2025-12-04T08:57:43.8607866Z * [new branch] gh/mikaylagawarecki/351/head -> origin/gh/mikaylagawarecki/351/head 2025-12-04T08:57:43.8608987Z * [new branch] gh/mikaylagawarecki/351/orig -> origin/gh/mikaylagawarecki/351/orig 2025-12-04T08:57:43.8610725Z * [new branch] gh/mikaylagawarecki/352/base -> origin/gh/mikaylagawarecki/352/base 2025-12-04T08:57:43.8611958Z * [new branch] gh/mikaylagawarecki/352/head -> origin/gh/mikaylagawarecki/352/head 2025-12-04T08:57:43.8613102Z * [new branch] gh/mikaylagawarecki/352/orig -> origin/gh/mikaylagawarecki/352/orig 2025-12-04T08:57:43.8614896Z * [new branch] gh/mikaylagawarecki/353/base -> origin/gh/mikaylagawarecki/353/base 2025-12-04T08:57:43.8616392Z * [new branch] gh/mikaylagawarecki/353/head -> origin/gh/mikaylagawarecki/353/head 2025-12-04T08:57:43.8617759Z * [new branch] gh/mikaylagawarecki/353/orig -> origin/gh/mikaylagawarecki/353/orig 2025-12-04T08:57:43.8619076Z * [new branch] gh/mikaylagawarecki/354/base -> origin/gh/mikaylagawarecki/354/base 2025-12-04T08:57:43.8620187Z * [new branch] gh/mikaylagawarecki/354/head -> origin/gh/mikaylagawarecki/354/head 2025-12-04T08:57:43.8624571Z * [new branch] gh/mikaylagawarecki/354/orig -> origin/gh/mikaylagawarecki/354/orig 2025-12-04T08:57:43.8626654Z * [new branch] gh/mikaylagawarecki/356/base -> origin/gh/mikaylagawarecki/356/base 2025-12-04T08:57:43.8627903Z * [new branch] gh/mikaylagawarecki/356/head -> origin/gh/mikaylagawarecki/356/head 2025-12-04T08:57:43.8629105Z * [new branch] gh/mikaylagawarecki/356/orig -> origin/gh/mikaylagawarecki/356/orig 2025-12-04T08:57:43.8630678Z * [new branch] gh/mikaylagawarecki/357/base -> origin/gh/mikaylagawarecki/357/base 2025-12-04T08:57:43.8631784Z * [new branch] gh/mikaylagawarecki/357/head -> origin/gh/mikaylagawarecki/357/head 2025-12-04T08:57:43.8633422Z * [new branch] gh/mikaylagawarecki/357/orig -> origin/gh/mikaylagawarecki/357/orig 2025-12-04T08:57:43.8635170Z * [new branch] gh/mikaylagawarecki/359/base -> origin/gh/mikaylagawarecki/359/base 2025-12-04T08:57:43.8636408Z * [new branch] gh/mikaylagawarecki/359/head -> origin/gh/mikaylagawarecki/359/head 2025-12-04T08:57:43.8637531Z * [new branch] gh/mikaylagawarecki/359/orig -> origin/gh/mikaylagawarecki/359/orig 2025-12-04T08:57:43.8638997Z * [new branch] gh/mikaylagawarecki/360/base -> origin/gh/mikaylagawarecki/360/base 2025-12-04T08:57:43.8640127Z * [new branch] gh/mikaylagawarecki/360/head -> origin/gh/mikaylagawarecki/360/head 2025-12-04T08:57:43.8641230Z * [new branch] gh/mikaylagawarecki/360/orig -> origin/gh/mikaylagawarecki/360/orig 2025-12-04T08:57:43.8642825Z * [new branch] gh/mikaylagawarecki/361/base -> origin/gh/mikaylagawarecki/361/base 2025-12-04T08:57:43.8643952Z * [new branch] gh/mikaylagawarecki/361/head -> origin/gh/mikaylagawarecki/361/head 2025-12-04T08:57:43.8645047Z * [new branch] gh/mikaylagawarecki/361/orig -> origin/gh/mikaylagawarecki/361/orig 2025-12-04T08:57:43.8646728Z * [new branch] gh/mikaylagawarecki/362/base -> origin/gh/mikaylagawarecki/362/base 2025-12-04T08:57:43.8647994Z * [new branch] gh/mikaylagawarecki/362/head -> origin/gh/mikaylagawarecki/362/head 2025-12-04T08:57:43.8649585Z * [new branch] gh/mikaylagawarecki/362/orig -> origin/gh/mikaylagawarecki/362/orig 2025-12-04T08:57:43.8651447Z * [new branch] gh/mikaylagawarecki/363/base -> origin/gh/mikaylagawarecki/363/base 2025-12-04T08:57:43.8652640Z * [new branch] gh/mikaylagawarecki/363/head -> origin/gh/mikaylagawarecki/363/head 2025-12-04T08:57:43.8653873Z * [new branch] gh/mikaylagawarecki/363/orig -> origin/gh/mikaylagawarecki/363/orig 2025-12-04T08:57:43.8655868Z * [new branch] gh/mikaylagawarecki/364/base -> origin/gh/mikaylagawarecki/364/base 2025-12-04T08:57:43.8657282Z * [new branch] gh/mikaylagawarecki/364/head -> origin/gh/mikaylagawarecki/364/head 2025-12-04T08:57:43.8658445Z * [new branch] gh/mikaylagawarecki/364/orig -> origin/gh/mikaylagawarecki/364/orig 2025-12-04T08:57:43.8660228Z * [new branch] gh/mikaylagawarecki/365/base -> origin/gh/mikaylagawarecki/365/base 2025-12-04T08:57:43.8661374Z * [new branch] gh/mikaylagawarecki/365/head -> origin/gh/mikaylagawarecki/365/head 2025-12-04T08:57:43.8662829Z * [new branch] gh/mikaylagawarecki/365/orig -> origin/gh/mikaylagawarecki/365/orig 2025-12-04T08:57:43.8664843Z * [new branch] gh/mikaylagawarecki/366/base -> origin/gh/mikaylagawarecki/366/base 2025-12-04T08:57:43.8665857Z * [new branch] gh/mikaylagawarecki/366/head -> origin/gh/mikaylagawarecki/366/head 2025-12-04T08:57:43.8667059Z * [new branch] gh/mikaylagawarecki/366/orig -> origin/gh/mikaylagawarecki/366/orig 2025-12-04T08:57:43.8668573Z * [new branch] gh/mikaylagawarecki/367/base -> origin/gh/mikaylagawarecki/367/base 2025-12-04T08:57:43.8669780Z * [new branch] gh/mikaylagawarecki/367/head -> origin/gh/mikaylagawarecki/367/head 2025-12-04T08:57:43.8670901Z * [new branch] gh/mikaylagawarecki/367/orig -> origin/gh/mikaylagawarecki/367/orig 2025-12-04T08:57:43.8672472Z * [new branch] gh/mikaylagawarecki/368/base -> origin/gh/mikaylagawarecki/368/base 2025-12-04T08:57:43.8673560Z * [new branch] gh/mikaylagawarecki/368/head -> origin/gh/mikaylagawarecki/368/head 2025-12-04T08:57:43.8674658Z * [new branch] gh/mikaylagawarecki/368/orig -> origin/gh/mikaylagawarecki/368/orig 2025-12-04T08:57:43.8676146Z * [new branch] gh/mikaylagawarecki/369/base -> origin/gh/mikaylagawarecki/369/base 2025-12-04T08:57:43.8677270Z * [new branch] gh/mikaylagawarecki/369/head -> origin/gh/mikaylagawarecki/369/head 2025-12-04T08:57:43.8678376Z * [new branch] gh/mikaylagawarecki/369/orig -> origin/gh/mikaylagawarecki/369/orig 2025-12-04T08:57:43.8680065Z * [new branch] gh/mikaylagawarecki/370/base -> origin/gh/mikaylagawarecki/370/base 2025-12-04T08:57:43.8681180Z * [new branch] gh/mikaylagawarecki/370/head -> origin/gh/mikaylagawarecki/370/head 2025-12-04T08:57:43.8682236Z * [new branch] gh/mikaylagawarecki/370/orig -> origin/gh/mikaylagawarecki/370/orig 2025-12-04T08:57:43.8683796Z * [new branch] gh/mikaylagawarecki/371/base -> origin/gh/mikaylagawarecki/371/base 2025-12-04T08:57:43.8684870Z * [new branch] gh/mikaylagawarecki/371/head -> origin/gh/mikaylagawarecki/371/head 2025-12-04T08:57:43.8685897Z * [new branch] gh/mikaylagawarecki/371/orig -> origin/gh/mikaylagawarecki/371/orig 2025-12-04T08:57:43.8687396Z * [new branch] gh/mikaylagawarecki/372/base -> origin/gh/mikaylagawarecki/372/base 2025-12-04T08:57:43.8688530Z * [new branch] gh/mikaylagawarecki/372/head -> origin/gh/mikaylagawarecki/372/head 2025-12-04T08:57:43.8689619Z * [new branch] gh/mikaylagawarecki/372/orig -> origin/gh/mikaylagawarecki/372/orig 2025-12-04T08:57:43.8691069Z * [new branch] gh/mikaylagawarecki/373/base -> origin/gh/mikaylagawarecki/373/base 2025-12-04T08:57:43.8692218Z * [new branch] gh/mikaylagawarecki/373/head -> origin/gh/mikaylagawarecki/373/head 2025-12-04T08:57:43.8693323Z * [new branch] gh/mikaylagawarecki/373/orig -> origin/gh/mikaylagawarecki/373/orig 2025-12-04T08:57:43.8694921Z * [new branch] gh/mikaylagawarecki/374/base -> origin/gh/mikaylagawarecki/374/base 2025-12-04T08:57:43.8696033Z * [new branch] gh/mikaylagawarecki/374/head -> origin/gh/mikaylagawarecki/374/head 2025-12-04T08:57:43.8697446Z * [new branch] gh/mikaylagawarecki/374/orig -> origin/gh/mikaylagawarecki/374/orig 2025-12-04T08:57:43.8699019Z * [new branch] gh/mikaylagawarecki/375/base -> origin/gh/mikaylagawarecki/375/base 2025-12-04T08:57:43.8700190Z * [new branch] gh/mikaylagawarecki/375/head -> origin/gh/mikaylagawarecki/375/head 2025-12-04T08:57:43.8701460Z * [new branch] gh/mikaylagawarecki/375/orig -> origin/gh/mikaylagawarecki/375/orig 2025-12-04T08:57:43.8703058Z * [new branch] gh/mikaylagawarecki/376/base -> origin/gh/mikaylagawarecki/376/base 2025-12-04T08:57:43.8704252Z * [new branch] gh/mikaylagawarecki/376/head -> origin/gh/mikaylagawarecki/376/head 2025-12-04T08:57:43.8705490Z * [new branch] gh/mikaylagawarecki/376/orig -> origin/gh/mikaylagawarecki/376/orig 2025-12-04T08:57:43.8706962Z * [new branch] gh/mikaylagawarecki/377/base -> origin/gh/mikaylagawarecki/377/base 2025-12-04T08:57:43.8708198Z * [new branch] gh/mikaylagawarecki/377/head -> origin/gh/mikaylagawarecki/377/head 2025-12-04T08:57:43.8709814Z * [new branch] gh/mikaylagawarecki/377/orig -> origin/gh/mikaylagawarecki/377/orig 2025-12-04T08:57:43.8711415Z * [new branch] gh/mikaylagawarecki/378/base -> origin/gh/mikaylagawarecki/378/base 2025-12-04T08:57:43.8713046Z * [new branch] gh/mikaylagawarecki/378/head -> origin/gh/mikaylagawarecki/378/head 2025-12-04T08:57:43.8714186Z * [new branch] gh/mikaylagawarecki/378/orig -> origin/gh/mikaylagawarecki/378/orig 2025-12-04T08:57:43.8715717Z * [new branch] gh/mikaylagawarecki/379/base -> origin/gh/mikaylagawarecki/379/base 2025-12-04T08:57:43.8716802Z * [new branch] gh/mikaylagawarecki/379/head -> origin/gh/mikaylagawarecki/379/head 2025-12-04T08:57:43.8717884Z * [new branch] gh/mikaylagawarecki/379/orig -> origin/gh/mikaylagawarecki/379/orig 2025-12-04T08:57:43.8719247Z * [new branch] gh/mikaylagawarecki/380/base -> origin/gh/mikaylagawarecki/380/base 2025-12-04T08:57:43.8720918Z * [new branch] gh/mikaylagawarecki/380/head -> origin/gh/mikaylagawarecki/380/head 2025-12-04T08:57:43.8722425Z * [new branch] gh/mikaylagawarecki/380/orig -> origin/gh/mikaylagawarecki/380/orig 2025-12-04T08:57:43.8723770Z * [new branch] gh/mikaylagawarecki/381/base -> origin/gh/mikaylagawarecki/381/base 2025-12-04T08:57:43.8724895Z * [new branch] gh/mikaylagawarecki/381/head -> origin/gh/mikaylagawarecki/381/head 2025-12-04T08:57:43.8725983Z * [new branch] gh/mikaylagawarecki/381/orig -> origin/gh/mikaylagawarecki/381/orig 2025-12-04T08:57:43.8727501Z * [new branch] gh/mikaylagawarecki/382/base -> origin/gh/mikaylagawarecki/382/base 2025-12-04T08:57:43.8728594Z * [new branch] gh/mikaylagawarecki/382/head -> origin/gh/mikaylagawarecki/382/head 2025-12-04T08:57:43.8729764Z * [new branch] gh/mikaylagawarecki/382/orig -> origin/gh/mikaylagawarecki/382/orig 2025-12-04T08:57:43.8731392Z * [new branch] gh/mikaylagawarecki/383/base -> origin/gh/mikaylagawarecki/383/base 2025-12-04T08:57:43.8732521Z * [new branch] gh/mikaylagawarecki/383/head -> origin/gh/mikaylagawarecki/383/head 2025-12-04T08:57:43.8733816Z * [new branch] gh/mikaylagawarecki/383/orig -> origin/gh/mikaylagawarecki/383/orig 2025-12-04T08:57:43.8735320Z * [new branch] gh/mikaylagawarecki/384/base -> origin/gh/mikaylagawarecki/384/base 2025-12-04T08:57:43.8736482Z * [new branch] gh/mikaylagawarecki/384/head -> origin/gh/mikaylagawarecki/384/head 2025-12-04T08:57:43.8737890Z * [new branch] gh/mikaylagawarecki/384/orig -> origin/gh/mikaylagawarecki/384/orig 2025-12-04T08:57:43.8739442Z * [new branch] gh/mikaylagawarecki/385/base -> origin/gh/mikaylagawarecki/385/base 2025-12-04T08:57:43.8740629Z * [new branch] gh/mikaylagawarecki/385/head -> origin/gh/mikaylagawarecki/385/head 2025-12-04T08:57:43.8741791Z * [new branch] gh/mikaylagawarecki/385/orig -> origin/gh/mikaylagawarecki/385/orig 2025-12-04T08:57:43.8743560Z * [new branch] gh/mikaylagawarecki/386/base -> origin/gh/mikaylagawarecki/386/base 2025-12-04T08:57:43.8744632Z * [new branch] gh/mikaylagawarecki/386/head -> origin/gh/mikaylagawarecki/386/head 2025-12-04T08:57:43.8745817Z * [new branch] gh/mikaylagawarecki/386/orig -> origin/gh/mikaylagawarecki/386/orig 2025-12-04T08:57:43.8747344Z * [new branch] gh/mikaylagawarecki/387/base -> origin/gh/mikaylagawarecki/387/base 2025-12-04T08:57:43.8748589Z * [new branch] gh/mikaylagawarecki/387/head -> origin/gh/mikaylagawarecki/387/head 2025-12-04T08:57:43.8749765Z * [new branch] gh/mikaylagawarecki/387/orig -> origin/gh/mikaylagawarecki/387/orig 2025-12-04T08:57:43.8751088Z * [new branch] gh/mikaylagawarecki/388/base -> origin/gh/mikaylagawarecki/388/base 2025-12-04T08:57:43.8752189Z * [new branch] gh/mikaylagawarecki/388/head -> origin/gh/mikaylagawarecki/388/head 2025-12-04T08:57:43.8753282Z * [new branch] gh/mikaylagawarecki/388/orig -> origin/gh/mikaylagawarecki/388/orig 2025-12-04T08:57:43.8755313Z * [new branch] gh/mikaylagawarecki/389/base -> origin/gh/mikaylagawarecki/389/base 2025-12-04T08:57:43.8756370Z * [new branch] gh/mikaylagawarecki/389/head -> origin/gh/mikaylagawarecki/389/head 2025-12-04T08:57:43.8757442Z * [new branch] gh/mikaylagawarecki/389/orig -> origin/gh/mikaylagawarecki/389/orig 2025-12-04T08:57:43.8759142Z * [new branch] gh/mikaylagawarecki/390/base -> origin/gh/mikaylagawarecki/390/base 2025-12-04T08:57:43.8760154Z * [new branch] gh/mikaylagawarecki/390/head -> origin/gh/mikaylagawarecki/390/head 2025-12-04T08:57:43.8761220Z * [new branch] gh/mikaylagawarecki/390/orig -> origin/gh/mikaylagawarecki/390/orig 2025-12-04T08:57:43.8762810Z * [new branch] gh/mikaylagawarecki/391/base -> origin/gh/mikaylagawarecki/391/base 2025-12-04T08:57:43.8763982Z * [new branch] gh/mikaylagawarecki/391/head -> origin/gh/mikaylagawarecki/391/head 2025-12-04T08:57:43.8765054Z * [new branch] gh/mikaylagawarecki/391/orig -> origin/gh/mikaylagawarecki/391/orig 2025-12-04T08:57:43.8766779Z * [new branch] gh/mikaylagawarecki/392/base -> origin/gh/mikaylagawarecki/392/base 2025-12-04T08:57:43.8767892Z * [new branch] gh/mikaylagawarecki/392/head -> origin/gh/mikaylagawarecki/392/head 2025-12-04T08:57:43.8769532Z * [new branch] gh/mikaylagawarecki/392/orig -> origin/gh/mikaylagawarecki/392/orig 2025-12-04T08:57:43.8771299Z * [new branch] gh/mlazos/41/base -> origin/gh/mlazos/41/base 2025-12-04T08:57:43.8772379Z * [new branch] gh/mlazos/41/head -> origin/gh/mlazos/41/head 2025-12-04T08:57:43.8773568Z * [new branch] gh/mlazos/41/orig -> origin/gh/mlazos/41/orig 2025-12-04T08:57:43.8775054Z * [new branch] gh/mlazos/42/base -> origin/gh/mlazos/42/base 2025-12-04T08:57:43.8776060Z * [new branch] gh/mlazos/42/head -> origin/gh/mlazos/42/head 2025-12-04T08:57:43.8777608Z * [new branch] gh/mlazos/42/orig -> origin/gh/mlazos/42/orig 2025-12-04T08:57:43.8778863Z * [new branch] gh/mlazos/43/base -> origin/gh/mlazos/43/base 2025-12-04T08:57:43.8780027Z * [new branch] gh/mlazos/43/head -> origin/gh/mlazos/43/head 2025-12-04T08:57:43.8781168Z * [new branch] gh/mlazos/43/orig -> origin/gh/mlazos/43/orig 2025-12-04T08:57:43.8782526Z * [new branch] gh/mlazos/44/base -> origin/gh/mlazos/44/base 2025-12-04T08:57:43.8783595Z * [new branch] gh/mlazos/44/head -> origin/gh/mlazos/44/head 2025-12-04T08:57:43.8784690Z * [new branch] gh/mlazos/44/orig -> origin/gh/mlazos/44/orig 2025-12-04T08:57:43.8786153Z * [new branch] gh/mlazos/47/base -> origin/gh/mlazos/47/base 2025-12-04T08:57:43.8787220Z * [new branch] gh/mlazos/47/head -> origin/gh/mlazos/47/head 2025-12-04T08:57:43.8788490Z * [new branch] gh/mlazos/47/orig -> origin/gh/mlazos/47/orig 2025-12-04T08:57:43.8789919Z * [new branch] gh/mlazos/48/base -> origin/gh/mlazos/48/base 2025-12-04T08:57:43.8790983Z * [new branch] gh/mlazos/48/head -> origin/gh/mlazos/48/head 2025-12-04T08:57:43.8792122Z * [new branch] gh/mlazos/48/orig -> origin/gh/mlazos/48/orig 2025-12-04T08:57:43.8793444Z * [new branch] gh/mlazos/49/base -> origin/gh/mlazos/49/base 2025-12-04T08:57:43.8794524Z * [new branch] gh/mlazos/49/head -> origin/gh/mlazos/49/head 2025-12-04T08:57:43.8795557Z * [new branch] gh/mlazos/49/orig -> origin/gh/mlazos/49/orig 2025-12-04T08:57:43.8797026Z * [new branch] gh/mlazos/50/base -> origin/gh/mlazos/50/base 2025-12-04T08:57:43.8798137Z * [new branch] gh/mlazos/50/head -> origin/gh/mlazos/50/head 2025-12-04T08:57:43.8799151Z * [new branch] gh/mlazos/50/orig -> origin/gh/mlazos/50/orig 2025-12-04T08:57:43.8800480Z * [new branch] gh/mlazos/51/base -> origin/gh/mlazos/51/base 2025-12-04T08:57:43.8801559Z * [new branch] gh/mlazos/51/head -> origin/gh/mlazos/51/head 2025-12-04T08:57:43.8802959Z * [new branch] gh/mlazos/51/orig -> origin/gh/mlazos/51/orig 2025-12-04T08:57:43.8804266Z * [new branch] gh/mlazos/52/base -> origin/gh/mlazos/52/base 2025-12-04T08:57:43.8805315Z * [new branch] gh/mlazos/52/head -> origin/gh/mlazos/52/head 2025-12-04T08:57:43.8806385Z * [new branch] gh/mlazos/52/orig -> origin/gh/mlazos/52/orig 2025-12-04T08:57:43.8807850Z * [new branch] gh/mlazos/53/base -> origin/gh/mlazos/53/base 2025-12-04T08:57:43.8809003Z * [new branch] gh/mlazos/53/head -> origin/gh/mlazos/53/head 2025-12-04T08:57:43.8810071Z * [new branch] gh/mlazos/53/orig -> origin/gh/mlazos/53/orig 2025-12-04T08:57:43.8811470Z * [new branch] gh/mlazos/54/base -> origin/gh/mlazos/54/base 2025-12-04T08:57:43.8812540Z * [new branch] gh/mlazos/54/head -> origin/gh/mlazos/54/head 2025-12-04T08:57:43.8813644Z * [new branch] gh/mlazos/54/orig -> origin/gh/mlazos/54/orig 2025-12-04T08:57:43.8814965Z * [new branch] gh/mlazos/55/base -> origin/gh/mlazos/55/base 2025-12-04T08:57:43.8816034Z * [new branch] gh/mlazos/55/head -> origin/gh/mlazos/55/head 2025-12-04T08:57:43.8818151Z * [new branch] gh/mlazos/55/orig -> origin/gh/mlazos/55/orig 2025-12-04T08:57:43.8819617Z * [new branch] gh/mlazos/56/base -> origin/gh/mlazos/56/base 2025-12-04T08:57:43.8821460Z * [new branch] gh/mlazos/56/head -> origin/gh/mlazos/56/head 2025-12-04T08:57:43.8822604Z * [new branch] gh/mlazos/56/orig -> origin/gh/mlazos/56/orig 2025-12-04T08:57:43.8824110Z * [new branch] gh/mlazos/57/base -> origin/gh/mlazos/57/base 2025-12-04T08:57:43.8825174Z * [new branch] gh/mlazos/57/head -> origin/gh/mlazos/57/head 2025-12-04T08:57:43.8826269Z * [new branch] gh/mlazos/57/orig -> origin/gh/mlazos/57/orig 2025-12-04T08:57:43.8827739Z * [new branch] gh/mlazos/58/base -> origin/gh/mlazos/58/base 2025-12-04T08:57:43.8828899Z * [new branch] gh/mlazos/58/head -> origin/gh/mlazos/58/head 2025-12-04T08:57:43.8830091Z * [new branch] gh/mlazos/58/orig -> origin/gh/mlazos/58/orig 2025-12-04T08:57:43.8831501Z * [new branch] gh/mlazos/59/base -> origin/gh/mlazos/59/base 2025-12-04T08:57:43.8832648Z * [new branch] gh/mlazos/59/head -> origin/gh/mlazos/59/head 2025-12-04T08:57:43.8834039Z * [new branch] gh/mlazos/59/orig -> origin/gh/mlazos/59/orig 2025-12-04T08:57:43.8835526Z * [new branch] gh/mlazos/60/base -> origin/gh/mlazos/60/base 2025-12-04T08:57:43.8836613Z * [new branch] gh/mlazos/60/head -> origin/gh/mlazos/60/head 2025-12-04T08:57:43.8837801Z * [new branch] gh/mlazos/60/orig -> origin/gh/mlazos/60/orig 2025-12-04T08:57:43.8839951Z * [new branch] gh/mlazos/61/base -> origin/gh/mlazos/61/base 2025-12-04T08:57:43.8841089Z * [new branch] gh/mlazos/61/head -> origin/gh/mlazos/61/head 2025-12-04T08:57:43.8842189Z * [new branch] gh/mlazos/61/orig -> origin/gh/mlazos/61/orig 2025-12-04T08:57:43.8843669Z * [new branch] gh/mlazos/62/base -> origin/gh/mlazos/62/base 2025-12-04T08:57:43.8844731Z * [new branch] gh/mlazos/62/head -> origin/gh/mlazos/62/head 2025-12-04T08:57:43.8845871Z * [new branch] gh/mlazos/62/orig -> origin/gh/mlazos/62/orig 2025-12-04T08:57:43.8847402Z * [new branch] gh/mlazos/63/base -> origin/gh/mlazos/63/base 2025-12-04T08:57:43.8848553Z * [new branch] gh/mlazos/63/head -> origin/gh/mlazos/63/head 2025-12-04T08:57:43.8849804Z * [new branch] gh/mlazos/63/orig -> origin/gh/mlazos/63/orig 2025-12-04T08:57:43.8851177Z * [new branch] gh/mlazos/64/base -> origin/gh/mlazos/64/base 2025-12-04T08:57:43.8852263Z * [new branch] gh/mlazos/64/head -> origin/gh/mlazos/64/head 2025-12-04T08:57:43.8853314Z * [new branch] gh/mlazos/64/orig -> origin/gh/mlazos/64/orig 2025-12-04T08:57:43.8854776Z * [new branch] gh/mlazos/65/base -> origin/gh/mlazos/65/base 2025-12-04T08:57:43.8855843Z * [new branch] gh/mlazos/65/head -> origin/gh/mlazos/65/head 2025-12-04T08:57:43.8857272Z * [new branch] gh/mlazos/65/orig -> origin/gh/mlazos/65/orig 2025-12-04T08:57:43.8858763Z * [new branch] gh/mlazos/66/base -> origin/gh/mlazos/66/base 2025-12-04T08:57:43.8859867Z * [new branch] gh/mlazos/66/head -> origin/gh/mlazos/66/head 2025-12-04T08:57:43.8861039Z * [new branch] gh/mlazos/66/orig -> origin/gh/mlazos/66/orig 2025-12-04T08:57:43.8862543Z * [new branch] gh/mlazos/67/base -> origin/gh/mlazos/67/base 2025-12-04T08:57:43.8863649Z * [new branch] gh/mlazos/67/head -> origin/gh/mlazos/67/head 2025-12-04T08:57:43.8864863Z * [new branch] gh/mlazos/67/orig -> origin/gh/mlazos/67/orig 2025-12-04T08:57:43.8866349Z * [new branch] gh/mlazos/68/base -> origin/gh/mlazos/68/base 2025-12-04T08:57:43.8867452Z * [new branch] gh/mlazos/68/head -> origin/gh/mlazos/68/head 2025-12-04T08:57:43.8868554Z * [new branch] gh/mlazos/68/orig -> origin/gh/mlazos/68/orig 2025-12-04T08:57:43.8870176Z * [new branch] gh/mlazos/69/base -> origin/gh/mlazos/69/base 2025-12-04T08:57:43.8873176Z * [new branch] gh/mlazos/69/head -> origin/gh/mlazos/69/head 2025-12-04T08:57:43.8873420Z * [new branch] gh/mlazos/69/orig -> origin/gh/mlazos/69/orig 2025-12-04T08:57:43.8874691Z * [new branch] gh/mlazos/70/base -> origin/gh/mlazos/70/base 2025-12-04T08:57:43.8875257Z * [new branch] gh/mlazos/70/head -> origin/gh/mlazos/70/head 2025-12-04T08:57:43.8876440Z * [new branch] gh/mlazos/70/orig -> origin/gh/mlazos/70/orig 2025-12-04T08:57:43.8877966Z * [new branch] gh/mlazos/71/base -> origin/gh/mlazos/71/base 2025-12-04T08:57:43.8879088Z * [new branch] gh/mlazos/71/head -> origin/gh/mlazos/71/head 2025-12-04T08:57:43.8880271Z * [new branch] gh/mlazos/71/orig -> origin/gh/mlazos/71/orig 2025-12-04T08:57:43.8881715Z * [new branch] gh/mlazos/72/base -> origin/gh/mlazos/72/base 2025-12-04T08:57:43.8882786Z * [new branch] gh/mlazos/72/head -> origin/gh/mlazos/72/head 2025-12-04T08:57:43.8883998Z * [new branch] gh/mlazos/72/orig -> origin/gh/mlazos/72/orig 2025-12-04T08:57:43.8885435Z * [new branch] gh/mlazos/73/base -> origin/gh/mlazos/73/base 2025-12-04T08:57:43.8886552Z * [new branch] gh/mlazos/73/head -> origin/gh/mlazos/73/head 2025-12-04T08:57:43.8887714Z * [new branch] gh/mlazos/73/orig -> origin/gh/mlazos/73/orig 2025-12-04T08:57:43.8889418Z * [new branch] gh/mrmiywj/1/base -> origin/gh/mrmiywj/1/base 2025-12-04T08:57:43.8890603Z * [new branch] gh/mrmiywj/1/head -> origin/gh/mrmiywj/1/head 2025-12-04T08:57:43.8892405Z * [new branch] gh/muchulee8/73/base -> origin/gh/muchulee8/73/base 2025-12-04T08:57:43.8893744Z * [new branch] gh/muchulee8/73/head -> origin/gh/muchulee8/73/head 2025-12-04T08:57:43.8894930Z * [new branch] gh/muchulee8/73/orig -> origin/gh/muchulee8/73/orig 2025-12-04T08:57:43.8897031Z * [new branch] gh/naveenthangudu/1/base -> origin/gh/naveenthangudu/1/base 2025-12-04T08:57:43.8898281Z * [new branch] gh/naveenthangudu/1/head -> origin/gh/naveenthangudu/1/head 2025-12-04T08:57:43.8899555Z * [new branch] gh/naveenthangudu/1/orig -> origin/gh/naveenthangudu/1/orig 2025-12-04T08:57:43.8901266Z * [new branch] gh/naveenthangudu/2/base -> origin/gh/naveenthangudu/2/base 2025-12-04T08:57:43.8902397Z * [new branch] gh/naveenthangudu/2/head -> origin/gh/naveenthangudu/2/head 2025-12-04T08:57:43.8903562Z * [new branch] gh/naveenthangudu/2/orig -> origin/gh/naveenthangudu/2/orig 2025-12-04T08:57:43.8904990Z * [new branch] gh/naveenthangudu/3/base -> origin/gh/naveenthangudu/3/base 2025-12-04T08:57:43.8906100Z * [new branch] gh/naveenthangudu/3/head -> origin/gh/naveenthangudu/3/head 2025-12-04T08:57:43.8907304Z * [new branch] gh/naveenthangudu/3/orig -> origin/gh/naveenthangudu/3/orig 2025-12-04T08:57:43.8908965Z * [new branch] gh/naveenthangudu/4/base -> origin/gh/naveenthangudu/4/base 2025-12-04T08:57:43.8910059Z * [new branch] gh/naveenthangudu/4/head -> origin/gh/naveenthangudu/4/head 2025-12-04T08:57:43.8911246Z * [new branch] gh/naveenthangudu/4/orig -> origin/gh/naveenthangudu/4/orig 2025-12-04T08:57:43.8912672Z * [new branch] gh/naveenthangudu/5/base -> origin/gh/naveenthangudu/5/base 2025-12-04T08:57:43.8913756Z * [new branch] gh/naveenthangudu/5/head -> origin/gh/naveenthangudu/5/head 2025-12-04T08:57:43.8915165Z * [new branch] gh/naveenthangudu/5/orig -> origin/gh/naveenthangudu/5/orig 2025-12-04T08:57:43.8916631Z * [new branch] gh/naveenthangudu/6/base -> origin/gh/naveenthangudu/6/base 2025-12-04T08:57:43.8917688Z * [new branch] gh/naveenthangudu/6/head -> origin/gh/naveenthangudu/6/head 2025-12-04T08:57:43.8918719Z * [new branch] gh/naveenthangudu/6/orig -> origin/gh/naveenthangudu/6/orig 2025-12-04T08:57:43.8920105Z * [new branch] gh/naveenthangudu/7/base -> origin/gh/naveenthangudu/7/base 2025-12-04T08:57:43.8921539Z * [new branch] gh/naveenthangudu/7/head -> origin/gh/naveenthangudu/7/head 2025-12-04T08:57:43.8922715Z * [new branch] gh/naveenthangudu/7/orig -> origin/gh/naveenthangudu/7/orig 2025-12-04T08:57:43.8924182Z * [new branch] gh/naveenthangudu/8/base -> origin/gh/naveenthangudu/8/base 2025-12-04T08:57:43.8925325Z * [new branch] gh/naveenthangudu/8/head -> origin/gh/naveenthangudu/8/head 2025-12-04T08:57:43.8926528Z * [new branch] gh/naveenthangudu/8/orig -> origin/gh/naveenthangudu/8/orig 2025-12-04T08:57:43.8927999Z * [new branch] gh/naveenthangudu/9/base -> origin/gh/naveenthangudu/9/base 2025-12-04T08:57:43.8929110Z * [new branch] gh/naveenthangudu/9/head -> origin/gh/naveenthangudu/9/head 2025-12-04T08:57:43.8930383Z * [new branch] gh/naveenthangudu/9/orig -> origin/gh/naveenthangudu/9/orig 2025-12-04T08:57:43.8932020Z * [new branch] gh/nikitaved/1/base -> origin/gh/nikitaved/1/base 2025-12-04T08:57:43.8933205Z * [new branch] gh/nikitaved/1/head -> origin/gh/nikitaved/1/head 2025-12-04T08:57:43.8934390Z * [new branch] gh/nikitaved/1/orig -> origin/gh/nikitaved/1/orig 2025-12-04T08:57:43.8935894Z * [new branch] gh/nikitaved/10/base -> origin/gh/nikitaved/10/base 2025-12-04T08:57:43.8937307Z * [new branch] gh/nikitaved/10/head -> origin/gh/nikitaved/10/head 2025-12-04T08:57:43.8938448Z * [new branch] gh/nikitaved/10/orig -> origin/gh/nikitaved/10/orig 2025-12-04T08:57:43.8939958Z * [new branch] gh/nikitaved/11/base -> origin/gh/nikitaved/11/base 2025-12-04T08:57:43.8941158Z * [new branch] gh/nikitaved/11/head -> origin/gh/nikitaved/11/head 2025-12-04T08:57:43.8942334Z * [new branch] gh/nikitaved/11/orig -> origin/gh/nikitaved/11/orig 2025-12-04T08:57:43.8943750Z * [new branch] gh/nikitaved/12/base -> origin/gh/nikitaved/12/base 2025-12-04T08:57:43.8944873Z * [new branch] gh/nikitaved/12/head -> origin/gh/nikitaved/12/head 2025-12-04T08:57:43.8945986Z * [new branch] gh/nikitaved/12/orig -> origin/gh/nikitaved/12/orig 2025-12-04T08:57:43.8947458Z * [new branch] gh/nikitaved/13/base -> origin/gh/nikitaved/13/base 2025-12-04T08:57:43.8948693Z * [new branch] gh/nikitaved/13/head -> origin/gh/nikitaved/13/head 2025-12-04T08:57:43.8949789Z * [new branch] gh/nikitaved/13/orig -> origin/gh/nikitaved/13/orig 2025-12-04T08:57:43.8951253Z * [new branch] gh/nikitaved/14/base -> origin/gh/nikitaved/14/base 2025-12-04T08:57:43.8952447Z * [new branch] gh/nikitaved/14/head -> origin/gh/nikitaved/14/head 2025-12-04T08:57:43.8953560Z * [new branch] gh/nikitaved/14/orig -> origin/gh/nikitaved/14/orig 2025-12-04T08:57:43.8954993Z * [new branch] gh/nikitaved/15/base -> origin/gh/nikitaved/15/base 2025-12-04T08:57:43.8956107Z * [new branch] gh/nikitaved/15/head -> origin/gh/nikitaved/15/head 2025-12-04T08:57:43.8957187Z * [new branch] gh/nikitaved/15/orig -> origin/gh/nikitaved/15/orig 2025-12-04T08:57:43.8958632Z * [new branch] gh/nikitaved/16/base -> origin/gh/nikitaved/16/base 2025-12-04T08:57:43.8959696Z * [new branch] gh/nikitaved/16/head -> origin/gh/nikitaved/16/head 2025-12-04T08:57:43.8960792Z * [new branch] gh/nikitaved/16/orig -> origin/gh/nikitaved/16/orig 2025-12-04T08:57:43.8962689Z * [new branch] gh/nikitaved/2/base -> origin/gh/nikitaved/2/base 2025-12-04T08:57:43.8963785Z * [new branch] gh/nikitaved/2/head -> origin/gh/nikitaved/2/head 2025-12-04T08:57:43.8964893Z * [new branch] gh/nikitaved/2/orig -> origin/gh/nikitaved/2/orig 2025-12-04T08:57:43.8966896Z * [new branch] gh/nikitaved/4/base -> origin/gh/nikitaved/4/base 2025-12-04T08:57:43.8968001Z * [new branch] gh/nikitaved/4/head -> origin/gh/nikitaved/4/head 2025-12-04T08:57:43.8969074Z * [new branch] gh/nikitaved/4/orig -> origin/gh/nikitaved/4/orig 2025-12-04T08:57:43.8970699Z * [new branch] gh/nikitaved/5/base -> origin/gh/nikitaved/5/base 2025-12-04T08:57:43.8971780Z * [new branch] gh/nikitaved/5/head -> origin/gh/nikitaved/5/head 2025-12-04T08:57:43.8972852Z * [new branch] gh/nikitaved/5/orig -> origin/gh/nikitaved/5/orig 2025-12-04T08:57:43.8974258Z * [new branch] gh/nikitaved/6/base -> origin/gh/nikitaved/6/base 2025-12-04T08:57:43.8975408Z * [new branch] gh/nikitaved/6/head -> origin/gh/nikitaved/6/head 2025-12-04T08:57:43.8976480Z * [new branch] gh/nikitaved/6/orig -> origin/gh/nikitaved/6/orig 2025-12-04T08:57:43.8978273Z * [new branch] gh/nikitaved/8/base -> origin/gh/nikitaved/8/base 2025-12-04T08:57:43.8979446Z * [new branch] gh/nikitaved/8/head -> origin/gh/nikitaved/8/head 2025-12-04T08:57:43.8980567Z * [new branch] gh/nikitaved/8/orig -> origin/gh/nikitaved/8/orig 2025-12-04T08:57:43.8982032Z * [new branch] gh/nikitaved/9/base -> origin/gh/nikitaved/9/base 2025-12-04T08:57:43.8983150Z * [new branch] gh/nikitaved/9/head -> origin/gh/nikitaved/9/head 2025-12-04T08:57:43.8984236Z * [new branch] gh/nikitaved/9/orig -> origin/gh/nikitaved/9/orig 2025-12-04T08:57:43.8986059Z * [new branch] gh/oulgen/10/base -> origin/gh/oulgen/10/base 2025-12-04T08:57:43.8987146Z * [new branch] gh/oulgen/10/head -> origin/gh/oulgen/10/head 2025-12-04T08:57:43.8988303Z * [new branch] gh/oulgen/10/orig -> origin/gh/oulgen/10/orig 2025-12-04T08:57:43.8989894Z * [new branch] gh/oulgen/11/base -> origin/gh/oulgen/11/base 2025-12-04T08:57:43.8990977Z * [new branch] gh/oulgen/11/head -> origin/gh/oulgen/11/head 2025-12-04T08:57:43.8992134Z * [new branch] gh/oulgen/11/orig -> origin/gh/oulgen/11/orig 2025-12-04T08:57:43.8993589Z * [new branch] gh/oulgen/12/base -> origin/gh/oulgen/12/base 2025-12-04T08:57:43.8994603Z * [new branch] gh/oulgen/12/head -> origin/gh/oulgen/12/head 2025-12-04T08:57:43.8995649Z * [new branch] gh/oulgen/12/orig -> origin/gh/oulgen/12/orig 2025-12-04T08:57:43.8997103Z * [new branch] gh/oulgen/13/base -> origin/gh/oulgen/13/base 2025-12-04T08:57:43.8998193Z * [new branch] gh/oulgen/13/head -> origin/gh/oulgen/13/head 2025-12-04T08:57:43.8999333Z * [new branch] gh/oulgen/13/orig -> origin/gh/oulgen/13/orig 2025-12-04T08:57:43.9000761Z * [new branch] gh/oulgen/14/base -> origin/gh/oulgen/14/base 2025-12-04T08:57:43.9001820Z * [new branch] gh/oulgen/14/head -> origin/gh/oulgen/14/head 2025-12-04T08:57:43.9002984Z * [new branch] gh/oulgen/14/orig -> origin/gh/oulgen/14/orig 2025-12-04T08:57:43.9004377Z * [new branch] gh/oulgen/15/base -> origin/gh/oulgen/15/base 2025-12-04T08:57:43.9005459Z * [new branch] gh/oulgen/15/head -> origin/gh/oulgen/15/head 2025-12-04T08:57:43.9006601Z * [new branch] gh/oulgen/15/orig -> origin/gh/oulgen/15/orig 2025-12-04T08:57:43.9008024Z * [new branch] gh/oulgen/16/base -> origin/gh/oulgen/16/base 2025-12-04T08:57:43.9009078Z * [new branch] gh/oulgen/16/head -> origin/gh/oulgen/16/head 2025-12-04T08:57:43.9010307Z * [new branch] gh/oulgen/16/orig -> origin/gh/oulgen/16/orig 2025-12-04T08:57:43.9011562Z * [new branch] gh/oulgen/17/base -> origin/gh/oulgen/17/base 2025-12-04T08:57:43.9012653Z * [new branch] gh/oulgen/17/head -> origin/gh/oulgen/17/head 2025-12-04T08:57:43.9013782Z * [new branch] gh/oulgen/17/orig -> origin/gh/oulgen/17/orig 2025-12-04T08:57:43.9015220Z * [new branch] gh/oulgen/18/base -> origin/gh/oulgen/18/base 2025-12-04T08:57:43.9016406Z * [new branch] gh/oulgen/18/head -> origin/gh/oulgen/18/head 2025-12-04T08:57:43.9017951Z * [new branch] gh/oulgen/18/orig -> origin/gh/oulgen/18/orig 2025-12-04T08:57:43.9019276Z * [new branch] gh/oulgen/19/base -> origin/gh/oulgen/19/base 2025-12-04T08:57:43.9020320Z * [new branch] gh/oulgen/19/head -> origin/gh/oulgen/19/head 2025-12-04T08:57:43.9023912Z * [new branch] gh/oulgen/19/orig -> origin/gh/oulgen/19/orig 2025-12-04T08:57:43.9025534Z * [new branch] gh/oulgen/20/base -> origin/gh/oulgen/20/base 2025-12-04T08:57:43.9026702Z * [new branch] gh/oulgen/20/head -> origin/gh/oulgen/20/head 2025-12-04T08:57:43.9027852Z * [new branch] gh/oulgen/20/orig -> origin/gh/oulgen/20/orig 2025-12-04T08:57:43.9029265Z * [new branch] gh/oulgen/21/base -> origin/gh/oulgen/21/base 2025-12-04T08:57:43.9031129Z * [new branch] gh/oulgen/21/head -> origin/gh/oulgen/21/head 2025-12-04T08:57:43.9032185Z * [new branch] gh/oulgen/21/orig -> origin/gh/oulgen/21/orig 2025-12-04T08:57:43.9033726Z * [new branch] gh/oulgen/22/base -> origin/gh/oulgen/22/base 2025-12-04T08:57:43.9034829Z * [new branch] gh/oulgen/22/head -> origin/gh/oulgen/22/head 2025-12-04T08:57:43.9035902Z * [new branch] gh/oulgen/22/orig -> origin/gh/oulgen/22/orig 2025-12-04T08:57:43.9037313Z * [new branch] gh/oulgen/23/base -> origin/gh/oulgen/23/base 2025-12-04T08:57:43.9038446Z * [new branch] gh/oulgen/23/head -> origin/gh/oulgen/23/head 2025-12-04T08:57:43.9039507Z * [new branch] gh/oulgen/23/orig -> origin/gh/oulgen/23/orig 2025-12-04T08:57:43.9040843Z * [new branch] gh/oulgen/24/base -> origin/gh/oulgen/24/base 2025-12-04T08:57:43.9041884Z * [new branch] gh/oulgen/24/head -> origin/gh/oulgen/24/head 2025-12-04T08:57:43.9042998Z * [new branch] gh/oulgen/24/orig -> origin/gh/oulgen/24/orig 2025-12-04T08:57:43.9044420Z * [new branch] gh/oulgen/25/base -> origin/gh/oulgen/25/base 2025-12-04T08:57:43.9045456Z * [new branch] gh/oulgen/25/head -> origin/gh/oulgen/25/head 2025-12-04T08:57:43.9046691Z * [new branch] gh/oulgen/25/orig -> origin/gh/oulgen/25/orig 2025-12-04T08:57:43.9048085Z * [new branch] gh/oulgen/26/base -> origin/gh/oulgen/26/base 2025-12-04T08:57:43.9049157Z * [new branch] gh/oulgen/26/head -> origin/gh/oulgen/26/head 2025-12-04T08:57:43.9050222Z * [new branch] gh/oulgen/26/orig -> origin/gh/oulgen/26/orig 2025-12-04T08:57:43.9051622Z * [new branch] gh/oulgen/4/base -> origin/gh/oulgen/4/base 2025-12-04T08:57:43.9052716Z * [new branch] gh/oulgen/4/head -> origin/gh/oulgen/4/head 2025-12-04T08:57:43.9053822Z * [new branch] gh/oulgen/4/orig -> origin/gh/oulgen/4/orig 2025-12-04T08:57:43.9055706Z * [new branch] gh/oulgen/7/base -> origin/gh/oulgen/7/base 2025-12-04T08:57:43.9057084Z * [new branch] gh/oulgen/7/head -> origin/gh/oulgen/7/head 2025-12-04T08:57:43.9058250Z * [new branch] gh/oulgen/7/orig -> origin/gh/oulgen/7/orig 2025-12-04T08:57:43.9059799Z * [new branch] gh/oulgen/8/base -> origin/gh/oulgen/8/base 2025-12-04T08:57:43.9060923Z * [new branch] gh/oulgen/8/head -> origin/gh/oulgen/8/head 2025-12-04T08:57:43.9062174Z * [new branch] gh/oulgen/8/orig -> origin/gh/oulgen/8/orig 2025-12-04T08:57:43.9063590Z * [new branch] gh/oulgen/9/base -> origin/gh/oulgen/9/base 2025-12-04T08:57:43.9064689Z * [new branch] gh/oulgen/9/head -> origin/gh/oulgen/9/head 2025-12-04T08:57:43.9065863Z * [new branch] gh/oulgen/9/orig -> origin/gh/oulgen/9/orig 2025-12-04T08:57:43.9067563Z * [new branch] gh/patvig/mtia-serialization -> origin/gh/patvig/mtia-serialization 2025-12-04T08:57:43.9069464Z * [new branch] gh/pearu/108/base -> origin/gh/pearu/108/base 2025-12-04T08:57:43.9070555Z * [new branch] gh/pearu/108/head -> origin/gh/pearu/108/head 2025-12-04T08:57:43.9071769Z * [new branch] gh/pearu/108/orig -> origin/gh/pearu/108/orig 2025-12-04T08:57:43.9073237Z * [new branch] gh/pearu/109/base -> origin/gh/pearu/109/base 2025-12-04T08:57:43.9074373Z * [new branch] gh/pearu/109/head -> origin/gh/pearu/109/head 2025-12-04T08:57:43.9075471Z * [new branch] gh/pearu/109/orig -> origin/gh/pearu/109/orig 2025-12-04T08:57:43.9077060Z * [new branch] gh/pearu/110/base -> origin/gh/pearu/110/base 2025-12-04T08:57:43.9078166Z * [new branch] gh/pearu/110/head -> origin/gh/pearu/110/head 2025-12-04T08:57:43.9079293Z * [new branch] gh/pearu/110/orig -> origin/gh/pearu/110/orig 2025-12-04T08:57:43.9080700Z * [new branch] gh/pearu/111/base -> origin/gh/pearu/111/base 2025-12-04T08:57:43.9081763Z * [new branch] gh/pearu/111/head -> origin/gh/pearu/111/head 2025-12-04T08:57:43.9082911Z * [new branch] gh/pearu/111/orig -> origin/gh/pearu/111/orig 2025-12-04T08:57:43.9084387Z * [new branch] gh/pearu/112/base -> origin/gh/pearu/112/base 2025-12-04T08:57:43.9085432Z * [new branch] gh/pearu/112/head -> origin/gh/pearu/112/head 2025-12-04T08:57:43.9086506Z * [new branch] gh/pearu/112/orig -> origin/gh/pearu/112/orig 2025-12-04T08:57:43.9087857Z * [new branch] gh/pearu/115/base -> origin/gh/pearu/115/base 2025-12-04T08:57:43.9089012Z * [new branch] gh/pearu/115/head -> origin/gh/pearu/115/head 2025-12-04T08:57:43.9090063Z * [new branch] gh/pearu/115/orig -> origin/gh/pearu/115/orig 2025-12-04T08:57:43.9091486Z * [new branch] gh/pearu/116/base -> origin/gh/pearu/116/base 2025-12-04T08:57:43.9092584Z * [new branch] gh/pearu/116/head -> origin/gh/pearu/116/head 2025-12-04T08:57:43.9093677Z * [new branch] gh/pearu/116/orig -> origin/gh/pearu/116/orig 2025-12-04T08:57:43.9095062Z * [new branch] gh/pearu/117/base -> origin/gh/pearu/117/base 2025-12-04T08:57:43.9096322Z * [new branch] gh/pearu/117/head -> origin/gh/pearu/117/head 2025-12-04T08:57:43.9097783Z * [new branch] gh/pearu/117/orig -> origin/gh/pearu/117/orig 2025-12-04T08:57:43.9099300Z * [new branch] gh/pearu/118/base -> origin/gh/pearu/118/base 2025-12-04T08:57:43.9100404Z * [new branch] gh/pearu/118/head -> origin/gh/pearu/118/head 2025-12-04T08:57:43.9101596Z * [new branch] gh/pearu/118/orig -> origin/gh/pearu/118/orig 2025-12-04T08:57:43.9103036Z * [new branch] gh/pearu/119/base -> origin/gh/pearu/119/base 2025-12-04T08:57:43.9104146Z * [new branch] gh/pearu/119/head -> origin/gh/pearu/119/head 2025-12-04T08:57:43.9105252Z * [new branch] gh/pearu/119/orig -> origin/gh/pearu/119/orig 2025-12-04T08:57:43.9107302Z * [new branch] gh/pearu/139/base -> origin/gh/pearu/139/base 2025-12-04T08:57:43.9108437Z * [new branch] gh/pearu/139/head -> origin/gh/pearu/139/head 2025-12-04T08:57:43.9109648Z * [new branch] gh/pearu/139/orig -> origin/gh/pearu/139/orig 2025-12-04T08:57:43.9111119Z * [new branch] gh/pearu/140/base -> origin/gh/pearu/140/base 2025-12-04T08:57:43.9112217Z * [new branch] gh/pearu/140/head -> origin/gh/pearu/140/head 2025-12-04T08:57:43.9113399Z * [new branch] gh/pearu/140/orig -> origin/gh/pearu/140/orig 2025-12-04T08:57:43.9114763Z * [new branch] gh/pearu/142/base -> origin/gh/pearu/142/base 2025-12-04T08:57:43.9115837Z * [new branch] gh/pearu/142/head -> origin/gh/pearu/142/head 2025-12-04T08:57:43.9116904Z * [new branch] gh/pearu/142/orig -> origin/gh/pearu/142/orig 2025-12-04T08:57:43.9118312Z * [new branch] gh/pearu/143/base -> origin/gh/pearu/143/base 2025-12-04T08:57:43.9119387Z * [new branch] gh/pearu/143/head -> origin/gh/pearu/143/head 2025-12-04T08:57:43.9120470Z * [new branch] gh/pearu/143/orig -> origin/gh/pearu/143/orig 2025-12-04T08:57:43.9122554Z * [new branch] gh/pearu/147/base -> origin/gh/pearu/147/base 2025-12-04T08:57:43.9123650Z * [new branch] gh/pearu/147/head -> origin/gh/pearu/147/head 2025-12-04T08:57:43.9124773Z * [new branch] gh/pearu/147/orig -> origin/gh/pearu/147/orig 2025-12-04T08:57:43.9126258Z * [new branch] gh/pearu/149/base -> origin/gh/pearu/149/base 2025-12-04T08:57:43.9127361Z * [new branch] gh/pearu/149/head -> origin/gh/pearu/149/head 2025-12-04T08:57:43.9128468Z * [new branch] gh/pearu/149/orig -> origin/gh/pearu/149/orig 2025-12-04T08:57:43.9130378Z * [new branch] gh/pearu/150/base -> origin/gh/pearu/150/base 2025-12-04T08:57:43.9131542Z * [new branch] gh/pearu/150/head -> origin/gh/pearu/150/head 2025-12-04T08:57:43.9132646Z * [new branch] gh/pearu/150/orig -> origin/gh/pearu/150/orig 2025-12-04T08:57:43.9134249Z * [new branch] gh/pearu/151/base -> origin/gh/pearu/151/base 2025-12-04T08:57:43.9135342Z * [new branch] gh/pearu/151/head -> origin/gh/pearu/151/head 2025-12-04T08:57:43.9136506Z * [new branch] gh/pearu/151/orig -> origin/gh/pearu/151/orig 2025-12-04T08:57:43.9138442Z * [new branch] gh/pearu/152/base -> origin/gh/pearu/152/base 2025-12-04T08:57:43.9139616Z * [new branch] gh/pearu/152/head -> origin/gh/pearu/152/head 2025-12-04T08:57:43.9140738Z * [new branch] gh/pearu/152/orig -> origin/gh/pearu/152/orig 2025-12-04T08:57:43.9142682Z * [new branch] gh/pearu/153/base -> origin/gh/pearu/153/base 2025-12-04T08:57:43.9143798Z * [new branch] gh/pearu/153/head -> origin/gh/pearu/153/head 2025-12-04T08:57:43.9144997Z * [new branch] gh/pearu/153/orig -> origin/gh/pearu/153/orig 2025-12-04T08:57:43.9146417Z * [new branch] gh/pearu/154/base -> origin/gh/pearu/154/base 2025-12-04T08:57:43.9148019Z * [new branch] gh/pearu/154/head -> origin/gh/pearu/154/head 2025-12-04T08:57:43.9149273Z * [new branch] gh/pearu/154/orig -> origin/gh/pearu/154/orig 2025-12-04T08:57:43.9150816Z * [new branch] gh/pearu/155/base -> origin/gh/pearu/155/base 2025-12-04T08:57:43.9151900Z * [new branch] gh/pearu/155/head -> origin/gh/pearu/155/head 2025-12-04T08:57:43.9153027Z * [new branch] gh/pearu/155/orig -> origin/gh/pearu/155/orig 2025-12-04T08:57:43.9154589Z * [new branch] gh/pearu/156/base -> origin/gh/pearu/156/base 2025-12-04T08:57:43.9155687Z * [new branch] gh/pearu/156/head -> origin/gh/pearu/156/head 2025-12-04T08:57:43.9156752Z * [new branch] gh/pearu/156/orig -> origin/gh/pearu/156/orig 2025-12-04T08:57:43.9158662Z * [new branch] gh/pearu/56/base -> origin/gh/pearu/56/base 2025-12-04T08:57:43.9160453Z * [new branch] gh/pearu/56/head -> origin/gh/pearu/56/head 2025-12-04T08:57:43.9161743Z * [new branch] gh/pearu/56/orig -> origin/gh/pearu/56/orig 2025-12-04T08:57:43.9163297Z * [new branch] gh/pearu/97/base -> origin/gh/pearu/97/base 2025-12-04T08:57:43.9164558Z * [new branch] gh/pearu/97/head -> origin/gh/pearu/97/head 2025-12-04T08:57:43.9165637Z * [new branch] gh/pearu/97/orig -> origin/gh/pearu/97/orig 2025-12-04T08:57:43.9167400Z * [new branch] gh/pianpwk/21/base -> origin/gh/pianpwk/21/base 2025-12-04T08:57:43.9168448Z * [new branch] gh/pianpwk/21/head -> origin/gh/pianpwk/21/head 2025-12-04T08:57:43.9170007Z * [new branch] gh/pianpwk/28/base -> origin/gh/pianpwk/28/base 2025-12-04T08:57:43.9171075Z * [new branch] gh/pianpwk/28/head -> origin/gh/pianpwk/28/head 2025-12-04T08:57:43.9172180Z * [new branch] gh/pianpwk/28/orig -> origin/gh/pianpwk/28/orig 2025-12-04T08:57:43.9173643Z * [new branch] gh/pianpwk/29/base -> origin/gh/pianpwk/29/base 2025-12-04T08:57:43.9174886Z * [new branch] gh/pianpwk/29/head -> origin/gh/pianpwk/29/head 2025-12-04T08:57:43.9176010Z * [new branch] gh/pianpwk/29/orig -> origin/gh/pianpwk/29/orig 2025-12-04T08:57:43.9178030Z * [new branch] gh/pianpwk/30/base -> origin/gh/pianpwk/30/base 2025-12-04T08:57:43.9179063Z * [new branch] gh/pianpwk/30/head -> origin/gh/pianpwk/30/head 2025-12-04T08:57:43.9180242Z * [new branch] gh/pianpwk/30/orig -> origin/gh/pianpwk/30/orig 2025-12-04T08:57:43.9181772Z * [new branch] gh/pianpwk/31/base -> origin/gh/pianpwk/31/base 2025-12-04T08:57:43.9182937Z * [new branch] gh/pianpwk/31/head -> origin/gh/pianpwk/31/head 2025-12-04T08:57:43.9184049Z * [new branch] gh/pianpwk/31/orig -> origin/gh/pianpwk/31/orig 2025-12-04T08:57:43.9185482Z * [new branch] gh/pianpwk/32/base -> origin/gh/pianpwk/32/base 2025-12-04T08:57:43.9186607Z * [new branch] gh/pianpwk/32/head -> origin/gh/pianpwk/32/head 2025-12-04T08:57:43.9187716Z * [new branch] gh/pianpwk/32/orig -> origin/gh/pianpwk/32/orig 2025-12-04T08:57:43.9189158Z * [new branch] gh/pianpwk/33/base -> origin/gh/pianpwk/33/base 2025-12-04T08:57:43.9190235Z * [new branch] gh/pianpwk/33/head -> origin/gh/pianpwk/33/head 2025-12-04T08:57:43.9191334Z * [new branch] gh/pianpwk/33/orig -> origin/gh/pianpwk/33/orig 2025-12-04T08:57:43.9193123Z * [new branch] gh/pianpwk/34/base -> origin/gh/pianpwk/34/base 2025-12-04T08:57:43.9194549Z * [new branch] gh/pianpwk/34/head -> origin/gh/pianpwk/34/head 2025-12-04T08:57:43.9195791Z * [new branch] gh/pianpwk/34/orig -> origin/gh/pianpwk/34/orig 2025-12-04T08:57:43.9197252Z * [new branch] gh/pianpwk/35/base -> origin/gh/pianpwk/35/base 2025-12-04T08:57:43.9198399Z * [new branch] gh/pianpwk/35/head -> origin/gh/pianpwk/35/head 2025-12-04T08:57:43.9199634Z * [new branch] gh/pianpwk/35/orig -> origin/gh/pianpwk/35/orig 2025-12-04T08:57:43.9201449Z * [new branch] gh/rec/141/base -> origin/gh/rec/141/base 2025-12-04T08:57:43.9202550Z * [new branch] gh/rec/141/head -> origin/gh/rec/141/head 2025-12-04T08:57:43.9203944Z * [new branch] gh/rec/153/base -> origin/gh/rec/153/base 2025-12-04T08:57:43.9205026Z * [new branch] gh/rec/153/head -> origin/gh/rec/153/head 2025-12-04T08:57:43.9206095Z * [new branch] gh/rec/153/orig -> origin/gh/rec/153/orig 2025-12-04T08:57:43.9207538Z * [new branch] gh/rec/154/base -> origin/gh/rec/154/base 2025-12-04T08:57:43.9208808Z * [new branch] gh/rec/154/head -> origin/gh/rec/154/head 2025-12-04T08:57:43.9209700Z * [new branch] gh/rec/154/orig -> origin/gh/rec/154/orig 2025-12-04T08:57:43.9211325Z * [new branch] gh/rec/164/base -> origin/gh/rec/164/base 2025-12-04T08:57:43.9212408Z * [new branch] gh/rec/164/head -> origin/gh/rec/164/head 2025-12-04T08:57:43.9213480Z * [new branch] gh/rec/164/orig -> origin/gh/rec/164/orig 2025-12-04T08:57:43.9215050Z * [new branch] gh/rec/166/base -> origin/gh/rec/166/base 2025-12-04T08:57:43.9216114Z * [new branch] gh/rec/166/head -> origin/gh/rec/166/head 2025-12-04T08:57:43.9217643Z * [new branch] gh/rec/166/orig -> origin/gh/rec/166/orig 2025-12-04T08:57:43.9219033Z * [new branch] gh/rec/167/base -> origin/gh/rec/167/base 2025-12-04T08:57:43.9220139Z * [new branch] gh/rec/167/head -> origin/gh/rec/167/head 2025-12-04T08:57:43.9221441Z * [new branch] gh/rec/167/orig -> origin/gh/rec/167/orig 2025-12-04T08:57:43.9222978Z * [new branch] gh/rec/168/base -> origin/gh/rec/168/base 2025-12-04T08:57:43.9224082Z * [new branch] gh/rec/168/head -> origin/gh/rec/168/head 2025-12-04T08:57:43.9225210Z * [new branch] gh/rec/168/orig -> origin/gh/rec/168/orig 2025-12-04T08:57:43.9226692Z * [new branch] gh/rec/169/base -> origin/gh/rec/169/base 2025-12-04T08:57:43.9227789Z * [new branch] gh/rec/169/head -> origin/gh/rec/169/head 2025-12-04T08:57:43.9228915Z * [new branch] gh/rec/169/orig -> origin/gh/rec/169/orig 2025-12-04T08:57:43.9230611Z * [new branch] gh/rec/170/base -> origin/gh/rec/170/base 2025-12-04T08:57:43.9231762Z * [new branch] gh/rec/170/head -> origin/gh/rec/170/head 2025-12-04T08:57:43.9232990Z * [new branch] gh/rec/170/orig -> origin/gh/rec/170/orig 2025-12-04T08:57:43.9234437Z * [new branch] gh/rec/171/base -> origin/gh/rec/171/base 2025-12-04T08:57:43.9235502Z * [new branch] gh/rec/171/head -> origin/gh/rec/171/head 2025-12-04T08:57:43.9236558Z * [new branch] gh/rec/171/orig -> origin/gh/rec/171/orig 2025-12-04T08:57:43.9237982Z * [new branch] gh/rec/172/base -> origin/gh/rec/172/base 2025-12-04T08:57:43.9239027Z * [new branch] gh/rec/172/head -> origin/gh/rec/172/head 2025-12-04T08:57:43.9240167Z * [new branch] gh/rec/172/orig -> origin/gh/rec/172/orig 2025-12-04T08:57:43.9241684Z * [new branch] gh/rec/173/base -> origin/gh/rec/173/base 2025-12-04T08:57:43.9242764Z * [new branch] gh/rec/173/head -> origin/gh/rec/173/head 2025-12-04T08:57:43.9243893Z * [new branch] gh/rec/173/orig -> origin/gh/rec/173/orig 2025-12-04T08:57:43.9245392Z * [new branch] gh/rec/174/base -> origin/gh/rec/174/base 2025-12-04T08:57:43.9246460Z * [new branch] gh/rec/174/head -> origin/gh/rec/174/head 2025-12-04T08:57:43.9247601Z * [new branch] gh/rec/174/orig -> origin/gh/rec/174/orig 2025-12-04T08:57:43.9249001Z * [new branch] gh/rec/175/base -> origin/gh/rec/175/base 2025-12-04T08:57:43.9250048Z * [new branch] gh/rec/175/head -> origin/gh/rec/175/head 2025-12-04T08:57:43.9251173Z * [new branch] gh/rec/175/orig -> origin/gh/rec/175/orig 2025-12-04T08:57:43.9252574Z * [new branch] gh/rec/176/base -> origin/gh/rec/176/base 2025-12-04T08:57:43.9253639Z * [new branch] gh/rec/176/head -> origin/gh/rec/176/head 2025-12-04T08:57:43.9254841Z * [new branch] gh/rec/176/orig -> origin/gh/rec/176/orig 2025-12-04T08:57:43.9256149Z * [new branch] gh/rec/177/base -> origin/gh/rec/177/base 2025-12-04T08:57:43.9257591Z * [new branch] gh/rec/177/head -> origin/gh/rec/177/head 2025-12-04T08:57:43.9258658Z * [new branch] gh/rec/177/orig -> origin/gh/rec/177/orig 2025-12-04T08:57:43.9260636Z * [new branch] gh/robert-hardwick/3/base -> origin/gh/robert-hardwick/3/base 2025-12-04T08:57:43.9262266Z * [new branch] gh/robert-hardwick/3/head -> origin/gh/robert-hardwick/3/head 2025-12-04T08:57:43.9263427Z * [new branch] gh/robert-hardwick/3/orig -> origin/gh/robert-hardwick/3/orig 2025-12-04T08:57:43.9264925Z * [new branch] gh/robert-hardwick/4/base -> origin/gh/robert-hardwick/4/base 2025-12-04T08:57:43.9266071Z * [new branch] gh/robert-hardwick/4/head -> origin/gh/robert-hardwick/4/head 2025-12-04T08:57:43.9267193Z * [new branch] gh/robert-hardwick/4/orig -> origin/gh/robert-hardwick/4/orig 2025-12-04T08:57:43.9268831Z * [new branch] gh/robert-hardwick/5/base -> origin/gh/robert-hardwick/5/base 2025-12-04T08:57:43.9269920Z * [new branch] gh/robert-hardwick/5/head -> origin/gh/robert-hardwick/5/head 2025-12-04T08:57:43.9271115Z * [new branch] gh/robert-hardwick/5/orig -> origin/gh/robert-hardwick/5/orig 2025-12-04T08:57:43.9272564Z * [new branch] gh/robert-hardwick/6/base -> origin/gh/robert-hardwick/6/base 2025-12-04T08:57:43.9273682Z * [new branch] gh/robert-hardwick/6/head -> origin/gh/robert-hardwick/6/head 2025-12-04T08:57:43.9274849Z * [new branch] gh/robert-hardwick/6/orig -> origin/gh/robert-hardwick/6/orig 2025-12-04T08:57:43.9276272Z * [new branch] gh/robert-hardwick/7/base -> origin/gh/robert-hardwick/7/base 2025-12-04T08:57:43.9277361Z * [new branch] gh/robert-hardwick/7/head -> origin/gh/robert-hardwick/7/head 2025-12-04T08:57:43.9278489Z * [new branch] gh/robert-hardwick/7/orig -> origin/gh/robert-hardwick/7/orig 2025-12-04T08:57:43.9279952Z * [new branch] gh/robert-hardwick/8/base -> origin/gh/robert-hardwick/8/base 2025-12-04T08:57:43.9281022Z * [new branch] gh/robert-hardwick/8/head -> origin/gh/robert-hardwick/8/head 2025-12-04T08:57:43.9282142Z * [new branch] gh/robert-hardwick/8/orig -> origin/gh/robert-hardwick/8/orig 2025-12-04T08:57:43.9283575Z * [new branch] gh/robert-hardwick/9/base -> origin/gh/robert-hardwick/9/base 2025-12-04T08:57:43.9284784Z * [new branch] gh/robert-hardwick/9/head -> origin/gh/robert-hardwick/9/head 2025-12-04T08:57:43.9285924Z * [new branch] gh/robert-hardwick/9/orig -> origin/gh/robert-hardwick/9/orig 2025-12-04T08:57:43.9287626Z * [new branch] gh/rtimpe/1/base -> origin/gh/rtimpe/1/base 2025-12-04T08:57:43.9288809Z * [new branch] gh/rtimpe/1/head -> origin/gh/rtimpe/1/head 2025-12-04T08:57:43.9290174Z * [new branch] gh/rtimpe/2/base -> origin/gh/rtimpe/2/base 2025-12-04T08:57:43.9291206Z * [new branch] gh/rtimpe/2/head -> origin/gh/rtimpe/2/head 2025-12-04T08:57:43.9292695Z * [new branch] gh/rtimpe/22/base -> origin/gh/rtimpe/22/base 2025-12-04T08:57:43.9293780Z * [new branch] gh/rtimpe/22/head -> origin/gh/rtimpe/22/head 2025-12-04T08:57:43.9295001Z * [new branch] gh/rtimpe/22/orig -> origin/gh/rtimpe/22/orig 2025-12-04T08:57:43.9296424Z * [new branch] gh/rtimpe/23/base -> origin/gh/rtimpe/23/base 2025-12-04T08:57:43.9297852Z * [new branch] gh/rtimpe/23/head -> origin/gh/rtimpe/23/head 2025-12-04T08:57:43.9299008Z * [new branch] gh/rtimpe/23/orig -> origin/gh/rtimpe/23/orig 2025-12-04T08:57:43.9300367Z * [new branch] gh/rtimpe/24/base -> origin/gh/rtimpe/24/base 2025-12-04T08:57:43.9301475Z * [new branch] gh/rtimpe/24/head -> origin/gh/rtimpe/24/head 2025-12-04T08:57:43.9302624Z * [new branch] gh/rtimpe/24/orig -> origin/gh/rtimpe/24/orig 2025-12-04T08:57:43.9304144Z * [new branch] gh/rtimpe/25/base -> origin/gh/rtimpe/25/base 2025-12-04T08:57:43.9305238Z * [new branch] gh/rtimpe/25/head -> origin/gh/rtimpe/25/head 2025-12-04T08:57:43.9306387Z * [new branch] gh/rtimpe/25/orig -> origin/gh/rtimpe/25/orig 2025-12-04T08:57:43.9307842Z * [new branch] gh/rtimpe/26/base -> origin/gh/rtimpe/26/base 2025-12-04T08:57:43.9309042Z * [new branch] gh/rtimpe/26/head -> origin/gh/rtimpe/26/head 2025-12-04T08:57:43.9310181Z * [new branch] gh/rtimpe/26/orig -> origin/gh/rtimpe/26/orig 2025-12-04T08:57:43.9311644Z * [new branch] gh/rtimpe/27/base -> origin/gh/rtimpe/27/base 2025-12-04T08:57:43.9312717Z * [new branch] gh/rtimpe/27/head -> origin/gh/rtimpe/27/head 2025-12-04T08:57:43.9313802Z * [new branch] gh/rtimpe/27/orig -> origin/gh/rtimpe/27/orig 2025-12-04T08:57:43.9315638Z * [new branch] gh/rtimpe/28/base -> origin/gh/rtimpe/28/base 2025-12-04T08:57:43.9316701Z * [new branch] gh/rtimpe/28/head -> origin/gh/rtimpe/28/head 2025-12-04T08:57:43.9317766Z * [new branch] gh/rtimpe/28/orig -> origin/gh/rtimpe/28/orig 2025-12-04T08:57:43.9319343Z * [new branch] gh/rtimpe/29/base -> origin/gh/rtimpe/29/base 2025-12-04T08:57:43.9320397Z * [new branch] gh/rtimpe/29/head -> origin/gh/rtimpe/29/head 2025-12-04T08:57:43.9321936Z * [new branch] gh/rtimpe/29/orig -> origin/gh/rtimpe/29/orig 2025-12-04T08:57:43.9323408Z * [new branch] gh/rtimpe/3/base -> origin/gh/rtimpe/3/base 2025-12-04T08:57:43.9324470Z * [new branch] gh/rtimpe/3/head -> origin/gh/rtimpe/3/head 2025-12-04T08:57:43.9325924Z * [new branch] gh/rtimpe/30/base -> origin/gh/rtimpe/30/base 2025-12-04T08:57:43.9327016Z * [new branch] gh/rtimpe/30/head -> origin/gh/rtimpe/30/head 2025-12-04T08:57:43.9328149Z * [new branch] gh/rtimpe/30/orig -> origin/gh/rtimpe/30/orig 2025-12-04T08:57:43.9329596Z * [new branch] gh/rtimpe/31/base -> origin/gh/rtimpe/31/base 2025-12-04T08:57:43.9330690Z * [new branch] gh/rtimpe/31/head -> origin/gh/rtimpe/31/head 2025-12-04T08:57:43.9331897Z * [new branch] gh/rtimpe/31/orig -> origin/gh/rtimpe/31/orig 2025-12-04T08:57:43.9333585Z * [new branch] gh/rtimpe/32/base -> origin/gh/rtimpe/32/base 2025-12-04T08:57:43.9334656Z * [new branch] gh/rtimpe/32/head -> origin/gh/rtimpe/32/head 2025-12-04T08:57:43.9335726Z * [new branch] gh/rtimpe/32/orig -> origin/gh/rtimpe/32/orig 2025-12-04T08:57:43.9337563Z * [new branch] gh/rtimpe/33/base -> origin/gh/rtimpe/33/base 2025-12-04T08:57:43.9338653Z * [new branch] gh/rtimpe/33/head -> origin/gh/rtimpe/33/head 2025-12-04T08:57:43.9339774Z * [new branch] gh/rtimpe/33/orig -> origin/gh/rtimpe/33/orig 2025-12-04T08:57:43.9341200Z * [new branch] gh/rtimpe/34/base -> origin/gh/rtimpe/34/base 2025-12-04T08:57:43.9342310Z * [new branch] gh/rtimpe/34/head -> origin/gh/rtimpe/34/head 2025-12-04T08:57:43.9343423Z * [new branch] gh/rtimpe/34/orig -> origin/gh/rtimpe/34/orig 2025-12-04T08:57:43.9345071Z * [new branch] gh/rtimpe/35/base -> origin/gh/rtimpe/35/base 2025-12-04T08:57:43.9346109Z * [new branch] gh/rtimpe/35/head -> origin/gh/rtimpe/35/head 2025-12-04T08:57:43.9347293Z * [new branch] gh/rtimpe/35/orig -> origin/gh/rtimpe/35/orig 2025-12-04T08:57:43.9348968Z * [new branch] gh/rtimpe/4/base -> origin/gh/rtimpe/4/base 2025-12-04T08:57:43.9350046Z * [new branch] gh/rtimpe/4/head -> origin/gh/rtimpe/4/head 2025-12-04T08:57:43.9351850Z * [new branch] gh/ruisizhang123/1/base -> origin/gh/ruisizhang123/1/base 2025-12-04T08:57:43.9352925Z * [new branch] gh/ruisizhang123/1/head -> origin/gh/ruisizhang123/1/head 2025-12-04T08:57:43.9354009Z * [new branch] gh/ruisizhang123/1/orig -> origin/gh/ruisizhang123/1/orig 2025-12-04T08:57:43.9355471Z * [new branch] gh/ruisizhang123/4/base -> origin/gh/ruisizhang123/4/base 2025-12-04T08:57:43.9356546Z * [new branch] gh/ruisizhang123/4/head -> origin/gh/ruisizhang123/4/head 2025-12-04T08:57:43.9357657Z * [new branch] gh/ruisizhang123/4/orig -> origin/gh/ruisizhang123/4/orig 2025-12-04T08:57:43.9359127Z * [new branch] gh/ruisizhang123/5/base -> origin/gh/ruisizhang123/5/base 2025-12-04T08:57:43.9360738Z * [new branch] gh/ruisizhang123/5/head -> origin/gh/ruisizhang123/5/head 2025-12-04T08:57:43.9361831Z * [new branch] gh/ruisizhang123/5/orig -> origin/gh/ruisizhang123/5/orig 2025-12-04T08:57:43.9363480Z * [new branch] gh/ruisizhang123/6/base -> origin/gh/ruisizhang123/6/base 2025-12-04T08:57:43.9364544Z * [new branch] gh/ruisizhang123/6/head -> origin/gh/ruisizhang123/6/head 2025-12-04T08:57:43.9365609Z * [new branch] gh/ruisizhang123/6/orig -> origin/gh/ruisizhang123/6/orig 2025-12-04T08:57:43.9367043Z * [new branch] gh/ruisizhang123/7/base -> origin/gh/ruisizhang123/7/base 2025-12-04T08:57:43.9368205Z * [new branch] gh/ruisizhang123/7/head -> origin/gh/ruisizhang123/7/head 2025-12-04T08:57:43.9369264Z * [new branch] gh/ruisizhang123/7/orig -> origin/gh/ruisizhang123/7/orig 2025-12-04T08:57:43.9370649Z * [new branch] gh/ruisizhang123/8/base -> origin/gh/ruisizhang123/8/base 2025-12-04T08:57:43.9371725Z * [new branch] gh/ruisizhang123/8/head -> origin/gh/ruisizhang123/8/head 2025-12-04T08:57:43.9372823Z * [new branch] gh/ruisizhang123/8/orig -> origin/gh/ruisizhang123/8/orig 2025-12-04T08:57:43.9374228Z * [new branch] gh/ruisizhang123/9/base -> origin/gh/ruisizhang123/9/base 2025-12-04T08:57:43.9375316Z * [new branch] gh/ruisizhang123/9/head -> origin/gh/ruisizhang123/9/head 2025-12-04T08:57:43.9376448Z * [new branch] gh/ruisizhang123/9/orig -> origin/gh/ruisizhang123/9/orig 2025-12-04T08:57:43.9378672Z * [new branch] gh/seemethere/52/base -> origin/gh/seemethere/52/base 2025-12-04T08:57:43.9379800Z * [new branch] gh/seemethere/52/head -> origin/gh/seemethere/52/head 2025-12-04T08:57:43.9380969Z * [new branch] gh/seemethere/52/orig -> origin/gh/seemethere/52/orig 2025-12-04T08:57:43.9382455Z * [new branch] gh/seemethere/53/base -> origin/gh/seemethere/53/base 2025-12-04T08:57:43.9383578Z * [new branch] gh/seemethere/53/head -> origin/gh/seemethere/53/head 2025-12-04T08:57:43.9384694Z * [new branch] gh/seemethere/53/orig -> origin/gh/seemethere/53/orig 2025-12-04T08:57:43.9386175Z * [new branch] gh/seemethere/54/base -> origin/gh/seemethere/54/base 2025-12-04T08:57:43.9387304Z * [new branch] gh/seemethere/54/head -> origin/gh/seemethere/54/head 2025-12-04T08:57:43.9388480Z * [new branch] gh/seemethere/54/orig -> origin/gh/seemethere/54/orig 2025-12-04T08:57:43.9390030Z * [new branch] gh/seemethere/55/base -> origin/gh/seemethere/55/base 2025-12-04T08:57:43.9390927Z * [new branch] gh/seemethere/55/head -> origin/gh/seemethere/55/head 2025-12-04T08:57:43.9392081Z * [new branch] gh/seemethere/55/orig -> origin/gh/seemethere/55/orig 2025-12-04T08:57:43.9393515Z * [new branch] gh/seemethere/59/base -> origin/gh/seemethere/59/base 2025-12-04T08:57:43.9394667Z * [new branch] gh/seemethere/59/head -> origin/gh/seemethere/59/head 2025-12-04T08:57:43.9395840Z * [new branch] gh/seemethere/59/orig -> origin/gh/seemethere/59/orig 2025-12-04T08:57:43.9397288Z * [new branch] gh/seemethere/62/base -> origin/gh/seemethere/62/base 2025-12-04T08:57:43.9398380Z * [new branch] gh/seemethere/62/head -> origin/gh/seemethere/62/head 2025-12-04T08:57:43.9399469Z * [new branch] gh/seemethere/62/orig -> origin/gh/seemethere/62/orig 2025-12-04T08:57:43.9400862Z * [new branch] gh/seemethere/63/base -> origin/gh/seemethere/63/base 2025-12-04T08:57:43.9401943Z * [new branch] gh/seemethere/63/head -> origin/gh/seemethere/63/head 2025-12-04T08:57:43.9403072Z * [new branch] gh/seemethere/63/orig -> origin/gh/seemethere/63/orig 2025-12-04T08:57:43.9404512Z * [new branch] gh/seemethere/71/base -> origin/gh/seemethere/71/base 2025-12-04T08:57:43.9405635Z * [new branch] gh/seemethere/71/head -> origin/gh/seemethere/71/head 2025-12-04T08:57:43.9406747Z * [new branch] gh/seemethere/71/orig -> origin/gh/seemethere/71/orig 2025-12-04T08:57:43.9408361Z * [new branch] gh/seemethere/72/base -> origin/gh/seemethere/72/base 2025-12-04T08:57:43.9409438Z * [new branch] gh/seemethere/72/head -> origin/gh/seemethere/72/head 2025-12-04T08:57:43.9411034Z * [new branch] gh/seemethere/72/orig -> origin/gh/seemethere/72/orig 2025-12-04T08:57:43.9412496Z * [new branch] gh/seemethere/73/base -> origin/gh/seemethere/73/base 2025-12-04T08:57:43.9413612Z * [new branch] gh/seemethere/73/head -> origin/gh/seemethere/73/head 2025-12-04T08:57:43.9414763Z * [new branch] gh/seemethere/73/orig -> origin/gh/seemethere/73/orig 2025-12-04T08:57:43.9416207Z * [new branch] gh/seemethere/74/base -> origin/gh/seemethere/74/base 2025-12-04T08:57:43.9417677Z * [new branch] gh/seemethere/74/head -> origin/gh/seemethere/74/head 2025-12-04T08:57:43.9418785Z * [new branch] gh/seemethere/74/orig -> origin/gh/seemethere/74/orig 2025-12-04T08:57:43.9420293Z * [new branch] gh/seemethere/75/base -> origin/gh/seemethere/75/base 2025-12-04T08:57:43.9423325Z * [new branch] gh/seemethere/75/head -> origin/gh/seemethere/75/head 2025-12-04T08:57:43.9424599Z * [new branch] gh/seemethere/75/orig -> origin/gh/seemethere/75/orig 2025-12-04T08:57:43.9426287Z * [new branch] gh/seemethere/76/base -> origin/gh/seemethere/76/base 2025-12-04T08:57:43.9427398Z * [new branch] gh/seemethere/76/head -> origin/gh/seemethere/76/head 2025-12-04T08:57:43.9428578Z * [new branch] gh/seemethere/76/orig -> origin/gh/seemethere/76/orig 2025-12-04T08:57:43.9430586Z * [new branch] gh/shunting314/145/base -> origin/gh/shunting314/145/base 2025-12-04T08:57:43.9431821Z * [new branch] gh/shunting314/145/head -> origin/gh/shunting314/145/head 2025-12-04T08:57:43.9433126Z * [new branch] gh/shunting314/145/orig -> origin/gh/shunting314/145/orig 2025-12-04T08:57:43.9434818Z * [new branch] gh/shunting314/176/base -> origin/gh/shunting314/176/base 2025-12-04T08:57:43.9436831Z * [new branch] gh/shunting314/176/head -> origin/gh/shunting314/176/head 2025-12-04T08:57:43.9437875Z * [new branch] gh/shunting314/176/orig -> origin/gh/shunting314/176/orig 2025-12-04T08:57:43.9439426Z * [new branch] gh/shunting314/249/base -> origin/gh/shunting314/249/base 2025-12-04T08:57:43.9440635Z * [new branch] gh/shunting314/249/head -> origin/gh/shunting314/249/head 2025-12-04T08:57:43.9441911Z * [new branch] gh/shunting314/249/orig -> origin/gh/shunting314/249/orig 2025-12-04T08:57:43.9443448Z * [new branch] gh/shunting314/253/base -> origin/gh/shunting314/253/base 2025-12-04T08:57:43.9444469Z * [new branch] gh/shunting314/253/head -> origin/gh/shunting314/253/head 2025-12-04T08:57:43.9445552Z * [new branch] gh/shunting314/253/orig -> origin/gh/shunting314/253/orig 2025-12-04T08:57:43.9447073Z * [new branch] gh/shunting314/256/base -> origin/gh/shunting314/256/base 2025-12-04T08:57:43.9448184Z * [new branch] gh/shunting314/256/head -> origin/gh/shunting314/256/head 2025-12-04T08:57:43.9449253Z * [new branch] gh/shunting314/256/orig -> origin/gh/shunting314/256/orig 2025-12-04T08:57:43.9451074Z * [new branch] gh/shunting314/257/base -> origin/gh/shunting314/257/base 2025-12-04T08:57:43.9452210Z * [new branch] gh/shunting314/257/head -> origin/gh/shunting314/257/head 2025-12-04T08:57:43.9453267Z * [new branch] gh/shunting314/257/orig -> origin/gh/shunting314/257/orig 2025-12-04T08:57:43.9454917Z * [new branch] gh/shunting314/258/base -> origin/gh/shunting314/258/base 2025-12-04T08:57:43.9456000Z * [new branch] gh/shunting314/258/head -> origin/gh/shunting314/258/head 2025-12-04T08:57:43.9457666Z * [new branch] gh/shunting314/258/orig -> origin/gh/shunting314/258/orig 2025-12-04T08:57:43.9458931Z * [new branch] gh/shunting314/259/base -> origin/gh/shunting314/259/base 2025-12-04T08:57:43.9460043Z * [new branch] gh/shunting314/259/head -> origin/gh/shunting314/259/head 2025-12-04T08:57:43.9461166Z * [new branch] gh/shunting314/259/orig -> origin/gh/shunting314/259/orig 2025-12-04T08:57:43.9462747Z * [new branch] gh/shunting314/260/base -> origin/gh/shunting314/260/base 2025-12-04T08:57:43.9463916Z * [new branch] gh/shunting314/260/head -> origin/gh/shunting314/260/head 2025-12-04T08:57:43.9465089Z * [new branch] gh/shunting314/260/orig -> origin/gh/shunting314/260/orig 2025-12-04T08:57:43.9466675Z * [new branch] gh/shunting314/261/base -> origin/gh/shunting314/261/base 2025-12-04T08:57:43.9467906Z * [new branch] gh/shunting314/261/head -> origin/gh/shunting314/261/head 2025-12-04T08:57:43.9469124Z * [new branch] gh/shunting314/261/orig -> origin/gh/shunting314/261/orig 2025-12-04T08:57:43.9470781Z * [new branch] gh/shunting314/262/base -> origin/gh/shunting314/262/base 2025-12-04T08:57:43.9471929Z * [new branch] gh/shunting314/262/head -> origin/gh/shunting314/262/head 2025-12-04T08:57:43.9473136Z * [new branch] gh/shunting314/262/orig -> origin/gh/shunting314/262/orig 2025-12-04T08:57:43.9474695Z * [new branch] gh/shunting314/263/base -> origin/gh/shunting314/263/base 2025-12-04T08:57:43.9475911Z * [new branch] gh/shunting314/263/head -> origin/gh/shunting314/263/head 2025-12-04T08:57:43.9477058Z * [new branch] gh/shunting314/263/orig -> origin/gh/shunting314/263/orig 2025-12-04T08:57:43.9478495Z * [new branch] gh/shunting314/264/base -> origin/gh/shunting314/264/base 2025-12-04T08:57:43.9479585Z * [new branch] gh/shunting314/264/head -> origin/gh/shunting314/264/head 2025-12-04T08:57:43.9480770Z * [new branch] gh/shunting314/264/orig -> origin/gh/shunting314/264/orig 2025-12-04T08:57:43.9482266Z * [new branch] gh/shunting314/265/base -> origin/gh/shunting314/265/base 2025-12-04T08:57:43.9483207Z * [new branch] gh/shunting314/265/head -> origin/gh/shunting314/265/head 2025-12-04T08:57:43.9484377Z * [new branch] gh/shunting314/265/orig -> origin/gh/shunting314/265/orig 2025-12-04T08:57:43.9485817Z * [new branch] gh/shunting314/266/base -> origin/gh/shunting314/266/base 2025-12-04T08:57:43.9487075Z * [new branch] gh/shunting314/266/head -> origin/gh/shunting314/266/head 2025-12-04T08:57:43.9488300Z * [new branch] gh/shunting314/266/orig -> origin/gh/shunting314/266/orig 2025-12-04T08:57:43.9489924Z * [new branch] gh/shunting314/267/base -> origin/gh/shunting314/267/base 2025-12-04T08:57:43.9491259Z * [new branch] gh/shunting314/267/head -> origin/gh/shunting314/267/head 2025-12-04T08:57:43.9492346Z * [new branch] gh/shunting314/267/orig -> origin/gh/shunting314/267/orig 2025-12-04T08:57:43.9494353Z * [new branch] gh/shunting314/268/base -> origin/gh/shunting314/268/base 2025-12-04T08:57:43.9495596Z * [new branch] gh/shunting314/268/head -> origin/gh/shunting314/268/head 2025-12-04T08:57:43.9496963Z * [new branch] gh/shunting314/268/orig -> origin/gh/shunting314/268/orig 2025-12-04T08:57:43.9498677Z * [new branch] gh/shunting314/269/base -> origin/gh/shunting314/269/base 2025-12-04T08:57:43.9500201Z * [new branch] gh/shunting314/269/head -> origin/gh/shunting314/269/head 2025-12-04T08:57:43.9501363Z * [new branch] gh/shunting314/269/orig -> origin/gh/shunting314/269/orig 2025-12-04T08:57:43.9503139Z * [new branch] gh/silverguo/1/base -> origin/gh/silverguo/1/base 2025-12-04T08:57:43.9504357Z * [new branch] gh/silverguo/1/head -> origin/gh/silverguo/1/head 2025-12-04T08:57:43.9505688Z * [new branch] gh/silverguo/2/base -> origin/gh/silverguo/2/base 2025-12-04T08:57:43.9506794Z * [new branch] gh/silverguo/2/head -> origin/gh/silverguo/2/head 2025-12-04T08:57:43.9508126Z * [new branch] gh/silverguo/3/base -> origin/gh/silverguo/3/base 2025-12-04T08:57:43.9509386Z * [new branch] gh/silverguo/3/head -> origin/gh/silverguo/3/head 2025-12-04T08:57:43.9510684Z * [new branch] gh/silverguo/4/base -> origin/gh/silverguo/4/base 2025-12-04T08:57:43.9511743Z * [new branch] gh/silverguo/4/head -> origin/gh/silverguo/4/head 2025-12-04T08:57:43.9513467Z * [new branch] gh/slayton58/39/base -> origin/gh/slayton58/39/base 2025-12-04T08:57:43.9514572Z * [new branch] gh/slayton58/39/head -> origin/gh/slayton58/39/head 2025-12-04T08:57:43.9515680Z * [new branch] gh/slayton58/39/orig -> origin/gh/slayton58/39/orig 2025-12-04T08:57:43.9517231Z * [new branch] gh/slayton58/42/base -> origin/gh/slayton58/42/base 2025-12-04T08:57:43.9518346Z * [new branch] gh/slayton58/42/head -> origin/gh/slayton58/42/head 2025-12-04T08:57:43.9519570Z * [new branch] gh/slayton58/42/orig -> origin/gh/slayton58/42/orig 2025-12-04T08:57:43.9521104Z * [new branch] gh/slayton58/43/base -> origin/gh/slayton58/43/base 2025-12-04T08:57:43.9522569Z * [new branch] gh/slayton58/43/head -> origin/gh/slayton58/43/head 2025-12-04T08:57:43.9523741Z * [new branch] gh/slayton58/43/orig -> origin/gh/slayton58/43/orig 2025-12-04T08:57:43.9525772Z * [new branch] gh/slayton58/44/base -> origin/gh/slayton58/44/base 2025-12-04T08:57:43.9526959Z * [new branch] gh/slayton58/44/head -> origin/gh/slayton58/44/head 2025-12-04T08:57:43.9528253Z * [new branch] gh/slayton58/44/orig -> origin/gh/slayton58/44/orig 2025-12-04T08:57:43.9529638Z * [new branch] gh/slayton58/45/base -> origin/gh/slayton58/45/base 2025-12-04T08:57:43.9530685Z * [new branch] gh/slayton58/45/head -> origin/gh/slayton58/45/head 2025-12-04T08:57:43.9531877Z * [new branch] gh/slayton58/45/orig -> origin/gh/slayton58/45/orig 2025-12-04T08:57:43.9533598Z * [new branch] gh/slayton58/46/base -> origin/gh/slayton58/46/base 2025-12-04T08:57:43.9534818Z * [new branch] gh/slayton58/46/head -> origin/gh/slayton58/46/head 2025-12-04T08:57:43.9535964Z * [new branch] gh/slayton58/46/orig -> origin/gh/slayton58/46/orig 2025-12-04T08:57:43.9537802Z * [new branch] gh/slayton58/6/base -> origin/gh/slayton58/6/base 2025-12-04T08:57:43.9538971Z * [new branch] gh/slayton58/6/head -> origin/gh/slayton58/6/head 2025-12-04T08:57:43.9540353Z * [new branch] gh/slayton58/7/base -> origin/gh/slayton58/7/base 2025-12-04T08:57:43.9541370Z * [new branch] gh/slayton58/7/head -> origin/gh/slayton58/7/head 2025-12-04T08:57:43.9543383Z * [new branch] gh/soulitzer/269/base -> origin/gh/soulitzer/269/base 2025-12-04T08:57:43.9544547Z * [new branch] gh/soulitzer/269/head -> origin/gh/soulitzer/269/head 2025-12-04T08:57:43.9545683Z * [new branch] gh/soulitzer/269/orig -> origin/gh/soulitzer/269/orig 2025-12-04T08:57:43.9547288Z * [new branch] gh/soulitzer/276/base -> origin/gh/soulitzer/276/base 2025-12-04T08:57:43.9548473Z * [new branch] gh/soulitzer/276/head -> origin/gh/soulitzer/276/head 2025-12-04T08:57:43.9549672Z * [new branch] gh/soulitzer/276/orig -> origin/gh/soulitzer/276/orig 2025-12-04T08:57:43.9551462Z * [new branch] gh/soulitzer/287/base -> origin/gh/soulitzer/287/base 2025-12-04T08:57:43.9552604Z * [new branch] gh/soulitzer/287/head -> origin/gh/soulitzer/287/head 2025-12-04T08:57:43.9553688Z * [new branch] gh/soulitzer/287/orig -> origin/gh/soulitzer/287/orig 2025-12-04T08:57:43.9555299Z * [new branch] gh/soulitzer/296/base -> origin/gh/soulitzer/296/base 2025-12-04T08:57:43.9556388Z * [new branch] gh/soulitzer/296/head -> origin/gh/soulitzer/296/head 2025-12-04T08:57:43.9557475Z * [new branch] gh/soulitzer/296/orig -> origin/gh/soulitzer/296/orig 2025-12-04T08:57:43.9559041Z * [new branch] gh/soulitzer/299/base -> origin/gh/soulitzer/299/base 2025-12-04T08:57:43.9560224Z * [new branch] gh/soulitzer/299/head -> origin/gh/soulitzer/299/head 2025-12-04T08:57:43.9561333Z * [new branch] gh/soulitzer/299/orig -> origin/gh/soulitzer/299/orig 2025-12-04T08:57:43.9562949Z * [new branch] gh/soulitzer/300/base -> origin/gh/soulitzer/300/base 2025-12-04T08:57:43.9564206Z * [new branch] gh/soulitzer/300/head -> origin/gh/soulitzer/300/head 2025-12-04T08:57:43.9565285Z * [new branch] gh/soulitzer/300/orig -> origin/gh/soulitzer/300/orig 2025-12-04T08:57:43.9566904Z * [new branch] gh/soulitzer/301/base -> origin/gh/soulitzer/301/base 2025-12-04T08:57:43.9568020Z * [new branch] gh/soulitzer/301/head -> origin/gh/soulitzer/301/head 2025-12-04T08:57:43.9569142Z * [new branch] gh/soulitzer/301/orig -> origin/gh/soulitzer/301/orig 2025-12-04T08:57:43.9570621Z * [new branch] gh/soulitzer/313/base -> origin/gh/soulitzer/313/base 2025-12-04T08:57:43.9571689Z * [new branch] gh/soulitzer/313/head -> origin/gh/soulitzer/313/head 2025-12-04T08:57:43.9572802Z * [new branch] gh/soulitzer/313/orig -> origin/gh/soulitzer/313/orig 2025-12-04T08:57:43.9574286Z * [new branch] gh/soulitzer/319/base -> origin/gh/soulitzer/319/base 2025-12-04T08:57:43.9575310Z * [new branch] gh/soulitzer/319/head -> origin/gh/soulitzer/319/head 2025-12-04T08:57:43.9576456Z * [new branch] gh/soulitzer/319/orig -> origin/gh/soulitzer/319/orig 2025-12-04T08:57:43.9578346Z * [new branch] gh/soulitzer/320/base -> origin/gh/soulitzer/320/base 2025-12-04T08:57:43.9580010Z * [new branch] gh/soulitzer/320/head -> origin/gh/soulitzer/320/head 2025-12-04T08:57:43.9581635Z * [new branch] gh/soulitzer/320/orig -> origin/gh/soulitzer/320/orig 2025-12-04T08:57:43.9583720Z * [new branch] gh/soulitzer/336/base -> origin/gh/soulitzer/336/base 2025-12-04T08:57:43.9584786Z * [new branch] gh/soulitzer/336/head -> origin/gh/soulitzer/336/head 2025-12-04T08:57:43.9585902Z * [new branch] gh/soulitzer/336/orig -> origin/gh/soulitzer/336/orig 2025-12-04T08:57:43.9587526Z * [new branch] gh/soulitzer/347/base -> origin/gh/soulitzer/347/base 2025-12-04T08:57:43.9588531Z * [new branch] gh/soulitzer/347/head -> origin/gh/soulitzer/347/head 2025-12-04T08:57:43.9589689Z * [new branch] gh/soulitzer/347/orig -> origin/gh/soulitzer/347/orig 2025-12-04T08:57:43.9591422Z * [new branch] gh/soulitzer/349/base -> origin/gh/soulitzer/349/base 2025-12-04T08:57:43.9592552Z * [new branch] gh/soulitzer/349/head -> origin/gh/soulitzer/349/head 2025-12-04T08:57:43.9593653Z * [new branch] gh/soulitzer/349/orig -> origin/gh/soulitzer/349/orig 2025-12-04T08:57:43.9595066Z * [new branch] gh/soulitzer/350/base -> origin/gh/soulitzer/350/base 2025-12-04T08:57:43.9596229Z * [new branch] gh/soulitzer/350/head -> origin/gh/soulitzer/350/head 2025-12-04T08:57:43.9597310Z * [new branch] gh/soulitzer/350/orig -> origin/gh/soulitzer/350/orig 2025-12-04T08:57:43.9598836Z * [new branch] gh/soulitzer/351/base -> origin/gh/soulitzer/351/base 2025-12-04T08:57:43.9599905Z * [new branch] gh/soulitzer/351/head -> origin/gh/soulitzer/351/head 2025-12-04T08:57:43.9600983Z * [new branch] gh/soulitzer/351/orig -> origin/gh/soulitzer/351/orig 2025-12-04T08:57:43.9602401Z * [new branch] gh/soulitzer/353/base -> origin/gh/soulitzer/353/base 2025-12-04T08:57:43.9603656Z * [new branch] gh/soulitzer/353/head -> origin/gh/soulitzer/353/head 2025-12-04T08:57:43.9604733Z * [new branch] gh/soulitzer/353/orig -> origin/gh/soulitzer/353/orig 2025-12-04T08:57:43.9607280Z * [new branch] gh/soulitzer/358/base -> origin/gh/soulitzer/358/base 2025-12-04T08:57:43.9608568Z * [new branch] gh/soulitzer/358/head -> origin/gh/soulitzer/358/head 2025-12-04T08:57:43.9609670Z * [new branch] gh/soulitzer/358/orig -> origin/gh/soulitzer/358/orig 2025-12-04T08:57:43.9611707Z * [new branch] gh/soulitzer/359/base -> origin/gh/soulitzer/359/base 2025-12-04T08:57:43.9612916Z * [new branch] gh/soulitzer/359/head -> origin/gh/soulitzer/359/head 2025-12-04T08:57:43.9614095Z * [new branch] gh/soulitzer/359/orig -> origin/gh/soulitzer/359/orig 2025-12-04T08:57:43.9615676Z * [new branch] gh/soulitzer/374/base -> origin/gh/soulitzer/374/base 2025-12-04T08:57:43.9617132Z * [new branch] gh/soulitzer/374/head -> origin/gh/soulitzer/374/head 2025-12-04T08:57:43.9618293Z * [new branch] gh/soulitzer/374/orig -> origin/gh/soulitzer/374/orig 2025-12-04T08:57:43.9619816Z * [new branch] gh/soulitzer/375/base -> origin/gh/soulitzer/375/base 2025-12-04T08:57:43.9621038Z * [new branch] gh/soulitzer/375/head -> origin/gh/soulitzer/375/head 2025-12-04T08:57:43.9622901Z * [new branch] gh/soulitzer/375/orig -> origin/gh/soulitzer/375/orig 2025-12-04T08:57:43.9624228Z * [new branch] gh/soulitzer/380/base -> origin/gh/soulitzer/380/base 2025-12-04T08:57:43.9625340Z * [new branch] gh/soulitzer/380/head -> origin/gh/soulitzer/380/head 2025-12-04T08:57:43.9626435Z * [new branch] gh/soulitzer/380/orig -> origin/gh/soulitzer/380/orig 2025-12-04T08:57:43.9627959Z * [new branch] gh/soulitzer/385/base -> origin/gh/soulitzer/385/base 2025-12-04T08:57:43.9629140Z * [new branch] gh/soulitzer/385/head -> origin/gh/soulitzer/385/head 2025-12-04T08:57:43.9630263Z * [new branch] gh/soulitzer/385/orig -> origin/gh/soulitzer/385/orig 2025-12-04T08:57:43.9631945Z * [new branch] gh/soulitzer/386/base -> origin/gh/soulitzer/386/base 2025-12-04T08:57:43.9633188Z * [new branch] gh/soulitzer/386/head -> origin/gh/soulitzer/386/head 2025-12-04T08:57:43.9634726Z * [new branch] gh/soulitzer/386/orig -> origin/gh/soulitzer/386/orig 2025-12-04T08:57:43.9635782Z * [new branch] gh/soulitzer/387/base -> origin/gh/soulitzer/387/base 2025-12-04T08:57:43.9636858Z * [new branch] gh/soulitzer/387/head -> origin/gh/soulitzer/387/head 2025-12-04T08:57:43.9637931Z * [new branch] gh/soulitzer/387/orig -> origin/gh/soulitzer/387/orig 2025-12-04T08:57:43.9639499Z * [new branch] gh/soulitzer/388/base -> origin/gh/soulitzer/388/base 2025-12-04T08:57:43.9640505Z * [new branch] gh/soulitzer/388/head -> origin/gh/soulitzer/388/head 2025-12-04T08:57:43.9641564Z * [new branch] gh/soulitzer/388/orig -> origin/gh/soulitzer/388/orig 2025-12-04T08:57:43.9643074Z * [new branch] gh/soulitzer/389/base -> origin/gh/soulitzer/389/base 2025-12-04T08:57:43.9644227Z * [new branch] gh/soulitzer/389/head -> origin/gh/soulitzer/389/head 2025-12-04T08:57:43.9645293Z * [new branch] gh/soulitzer/389/orig -> origin/gh/soulitzer/389/orig 2025-12-04T08:57:43.9647675Z * [new branch] gh/soulitzer/390/base -> origin/gh/soulitzer/390/base 2025-12-04T08:57:43.9649220Z * [new branch] gh/soulitzer/390/head -> origin/gh/soulitzer/390/head 2025-12-04T08:57:43.9650218Z * [new branch] gh/soulitzer/390/orig -> origin/gh/soulitzer/390/orig 2025-12-04T08:57:43.9651779Z * [new branch] gh/soulitzer/391/base -> origin/gh/soulitzer/391/base 2025-12-04T08:57:43.9652785Z * [new branch] gh/soulitzer/391/head -> origin/gh/soulitzer/391/head 2025-12-04T08:57:43.9653868Z * [new branch] gh/soulitzer/391/orig -> origin/gh/soulitzer/391/orig 2025-12-04T08:57:43.9655899Z * [new branch] gh/soulitzer/392/base -> origin/gh/soulitzer/392/base 2025-12-04T08:57:43.9657829Z * [new branch] gh/soulitzer/392/head -> origin/gh/soulitzer/392/head 2025-12-04T08:57:43.9658927Z * [new branch] gh/soulitzer/392/orig -> origin/gh/soulitzer/392/orig 2025-12-04T08:57:43.9660926Z * [new branch] gh/swolchok/728/next -> origin/gh/swolchok/728/next 2025-12-04T08:57:43.9662679Z * [new branch] gh/swolchok/819/base -> origin/gh/swolchok/819/base 2025-12-04T08:57:43.9663907Z * [new branch] gh/swolchok/819/head -> origin/gh/swolchok/819/head 2025-12-04T08:57:43.9665289Z * [new branch] gh/swolchok/819/orig -> origin/gh/swolchok/819/orig 2025-12-04T08:57:43.9666793Z * [new branch] gh/swolchok/824/base -> origin/gh/swolchok/824/base 2025-12-04T08:57:43.9667843Z * [new branch] gh/swolchok/824/head -> origin/gh/swolchok/824/head 2025-12-04T08:57:43.9669043Z * [new branch] gh/swolchok/824/orig -> origin/gh/swolchok/824/orig 2025-12-04T08:57:43.9670707Z * [new branch] gh/swolchok/829/base -> origin/gh/swolchok/829/base 2025-12-04T08:57:43.9671564Z * [new branch] gh/swolchok/829/head -> origin/gh/swolchok/829/head 2025-12-04T08:57:43.9672651Z * [new branch] gh/swolchok/829/orig -> origin/gh/swolchok/829/orig 2025-12-04T08:57:43.9674268Z * [new branch] gh/swolchok/839/base -> origin/gh/swolchok/839/base 2025-12-04T08:57:43.9675706Z * [new branch] gh/swolchok/839/head -> origin/gh/swolchok/839/head 2025-12-04T08:57:43.9676824Z * [new branch] gh/swolchok/839/orig -> origin/gh/swolchok/839/orig 2025-12-04T08:57:43.9678344Z * [new branch] gh/swolchok/841/base -> origin/gh/swolchok/841/base 2025-12-04T08:57:43.9679392Z * [new branch] gh/swolchok/841/head -> origin/gh/swolchok/841/head 2025-12-04T08:57:43.9681089Z * [new branch] gh/swolchok/841/orig -> origin/gh/swolchok/841/orig 2025-12-04T08:57:43.9682590Z * [new branch] gh/swolchok/842/base -> origin/gh/swolchok/842/base 2025-12-04T08:57:43.9683627Z * [new branch] gh/swolchok/842/head -> origin/gh/swolchok/842/head 2025-12-04T08:57:43.9684714Z * [new branch] gh/swolchok/842/orig -> origin/gh/swolchok/842/orig 2025-12-04T08:57:43.9686250Z * [new branch] gh/swolchok/845/base -> origin/gh/swolchok/845/base 2025-12-04T08:57:43.9687268Z * [new branch] gh/swolchok/845/head -> origin/gh/swolchok/845/head 2025-12-04T08:57:43.9688451Z * [new branch] gh/swolchok/845/orig -> origin/gh/swolchok/845/orig 2025-12-04T08:57:43.9690022Z * [new branch] gh/swolchok/848/base -> origin/gh/swolchok/848/base 2025-12-04T08:57:43.9691219Z * [new branch] gh/swolchok/848/head -> origin/gh/swolchok/848/head 2025-12-04T08:57:43.9692534Z * [new branch] gh/swolchok/848/orig -> origin/gh/swolchok/848/orig 2025-12-04T08:57:43.9694409Z * [new branch] gh/swolchok/856/base -> origin/gh/swolchok/856/base 2025-12-04T08:57:43.9695420Z * [new branch] gh/swolchok/856/head -> origin/gh/swolchok/856/head 2025-12-04T08:57:43.9696577Z * [new branch] gh/swolchok/856/orig -> origin/gh/swolchok/856/orig 2025-12-04T08:57:43.9698478Z * [new branch] gh/swolchok/860/base -> origin/gh/swolchok/860/base 2025-12-04T08:57:43.9699581Z * [new branch] gh/swolchok/860/head -> origin/gh/swolchok/860/head 2025-12-04T08:57:43.9700852Z * [new branch] gh/swolchok/860/orig -> origin/gh/swolchok/860/orig 2025-12-04T08:57:43.9702608Z * [new branch] gh/swolchok/861/base -> origin/gh/swolchok/861/base 2025-12-04T08:57:43.9703757Z * [new branch] gh/swolchok/861/head -> origin/gh/swolchok/861/head 2025-12-04T08:57:43.9704907Z * [new branch] gh/swolchok/861/orig -> origin/gh/swolchok/861/orig 2025-12-04T08:57:43.9706526Z * [new branch] gh/swolchok/862/base -> origin/gh/swolchok/862/base 2025-12-04T08:57:43.9707559Z * [new branch] gh/swolchok/862/head -> origin/gh/swolchok/862/head 2025-12-04T08:57:43.9708902Z * [new branch] gh/swolchok/862/orig -> origin/gh/swolchok/862/orig 2025-12-04T08:57:43.9710567Z * [new branch] gh/swolchok/863/base -> origin/gh/swolchok/863/base 2025-12-04T08:57:43.9711597Z * [new branch] gh/swolchok/863/head -> origin/gh/swolchok/863/head 2025-12-04T08:57:43.9712775Z * [new branch] gh/swolchok/863/orig -> origin/gh/swolchok/863/orig 2025-12-04T08:57:43.9714375Z * [new branch] gh/swolchok/864/base -> origin/gh/swolchok/864/base 2025-12-04T08:57:43.9715408Z * [new branch] gh/swolchok/864/head -> origin/gh/swolchok/864/head 2025-12-04T08:57:43.9716634Z * [new branch] gh/swolchok/864/orig -> origin/gh/swolchok/864/orig 2025-12-04T08:57:43.9718044Z * [new branch] gh/swolchok/865/base -> origin/gh/swolchok/865/base 2025-12-04T08:57:43.9719385Z * [new branch] gh/swolchok/865/head -> origin/gh/swolchok/865/head 2025-12-04T08:57:43.9720415Z * [new branch] gh/swolchok/865/orig -> origin/gh/swolchok/865/orig 2025-12-04T08:57:43.9722893Z * [new branch] gh/swolchok/866/base -> origin/gh/swolchok/866/base 2025-12-04T08:57:43.9724020Z * [new branch] gh/swolchok/866/head -> origin/gh/swolchok/866/head 2025-12-04T08:57:43.9725388Z * [new branch] gh/swolchok/866/orig -> origin/gh/swolchok/866/orig 2025-12-04T08:57:43.9726857Z * [new branch] gh/swolchok/867/base -> origin/gh/swolchok/867/base 2025-12-04T08:57:43.9727983Z * [new branch] gh/swolchok/867/head -> origin/gh/swolchok/867/head 2025-12-04T08:57:43.9729171Z * [new branch] gh/swolchok/867/orig -> origin/gh/swolchok/867/orig 2025-12-04T08:57:43.9730758Z * [new branch] gh/swolchok/868/base -> origin/gh/swolchok/868/base 2025-12-04T08:57:43.9731824Z * [new branch] gh/swolchok/868/head -> origin/gh/swolchok/868/head 2025-12-04T08:57:43.9732953Z * [new branch] gh/swolchok/868/orig -> origin/gh/swolchok/868/orig 2025-12-04T08:57:43.9734593Z * [new branch] gh/swolchok/869/base -> origin/gh/swolchok/869/base 2025-12-04T08:57:43.9735738Z * [new branch] gh/swolchok/869/head -> origin/gh/swolchok/869/head 2025-12-04T08:57:43.9737198Z * [new branch] gh/swolchok/869/orig -> origin/gh/swolchok/869/orig 2025-12-04T08:57:43.9738900Z * [new branch] gh/swolchok/870/base -> origin/gh/swolchok/870/base 2025-12-04T08:57:43.9739920Z * [new branch] gh/swolchok/870/head -> origin/gh/swolchok/870/head 2025-12-04T08:57:43.9741125Z * [new branch] gh/swolchok/870/orig -> origin/gh/swolchok/870/orig 2025-12-04T08:57:43.9742709Z * [new branch] gh/swolchok/871/base -> origin/gh/swolchok/871/base 2025-12-04T08:57:43.9743904Z * [new branch] gh/swolchok/871/head -> origin/gh/swolchok/871/head 2025-12-04T08:57:43.9745619Z * [new branch] gh/swolchok/871/orig -> origin/gh/swolchok/871/orig 2025-12-04T08:57:43.9747559Z * [new branch] gh/teja-rao/4/base -> origin/gh/teja-rao/4/base 2025-12-04T08:57:43.9748769Z * [new branch] gh/teja-rao/4/head -> origin/gh/teja-rao/4/head 2025-12-04T08:57:43.9749893Z * [new branch] gh/teja-rao/4/orig -> origin/gh/teja-rao/4/orig 2025-12-04T08:57:43.9751695Z * [new branch] gh/tianyu-l/2/base -> origin/gh/tianyu-l/2/base 2025-12-04T08:57:43.9752740Z * [new branch] gh/tianyu-l/2/head -> origin/gh/tianyu-l/2/head 2025-12-04T08:57:43.9753815Z * [new branch] gh/tianyu-l/2/orig -> origin/gh/tianyu-l/2/orig 2025-12-04T08:57:43.9755404Z * [new branch] gh/tianyu-l/3/base -> origin/gh/tianyu-l/3/base 2025-12-04T08:57:43.9756466Z * [new branch] gh/tianyu-l/3/orig -> origin/gh/tianyu-l/3/orig 2025-12-04T08:57:43.9757923Z * [new branch] gh/tianyu-l/4/base -> origin/gh/tianyu-l/4/base 2025-12-04T08:57:43.9758931Z * [new branch] gh/tianyu-l/4/head -> origin/gh/tianyu-l/4/head 2025-12-04T08:57:43.9760027Z * [new branch] gh/tianyu-l/4/orig -> origin/gh/tianyu-l/4/orig 2025-12-04T08:57:43.9762278Z * [new branch] gh/tugsbayasgalan/10/base -> origin/gh/tugsbayasgalan/10/base 2025-12-04T08:57:43.9763348Z * [new branch] gh/tugsbayasgalan/10/head -> origin/gh/tugsbayasgalan/10/head 2025-12-04T08:57:43.9764664Z * [new branch] gh/tugsbayasgalan/10/orig -> origin/gh/tugsbayasgalan/10/orig 2025-12-04T08:57:43.9765980Z * [new branch] gh/tugsbayasgalan/13/base -> origin/gh/tugsbayasgalan/13/base 2025-12-04T08:57:43.9767189Z * [new branch] gh/tugsbayasgalan/13/head -> origin/gh/tugsbayasgalan/13/head 2025-12-04T08:57:43.9768378Z * [new branch] gh/tugsbayasgalan/13/orig -> origin/gh/tugsbayasgalan/13/orig 2025-12-04T08:57:43.9770125Z * [new branch] gh/tugsbayasgalan/17/base -> origin/gh/tugsbayasgalan/17/base 2025-12-04T08:57:43.9771092Z * [new branch] gh/tugsbayasgalan/17/head -> origin/gh/tugsbayasgalan/17/head 2025-12-04T08:57:43.9772223Z * [new branch] gh/tugsbayasgalan/17/orig -> origin/gh/tugsbayasgalan/17/orig 2025-12-04T08:57:43.9773844Z * [new branch] gh/tugsbayasgalan/2/base -> origin/gh/tugsbayasgalan/2/base 2025-12-04T08:57:43.9774845Z * [new branch] gh/tugsbayasgalan/2/head -> origin/gh/tugsbayasgalan/2/head 2025-12-04T08:57:43.9775933Z * [new branch] gh/tugsbayasgalan/2/orig -> origin/gh/tugsbayasgalan/2/orig 2025-12-04T08:57:43.9778211Z * [new branch] gh/tugsbayasgalan/28/base -> origin/gh/tugsbayasgalan/28/base 2025-12-04T08:57:43.9779309Z * [new branch] gh/tugsbayasgalan/28/head -> origin/gh/tugsbayasgalan/28/head 2025-12-04T08:57:43.9780409Z * [new branch] gh/tugsbayasgalan/28/orig -> origin/gh/tugsbayasgalan/28/orig 2025-12-04T08:57:43.9782000Z * [new branch] gh/tugsbayasgalan/32/base -> origin/gh/tugsbayasgalan/32/base 2025-12-04T08:57:43.9783070Z * [new branch] gh/tugsbayasgalan/32/head -> origin/gh/tugsbayasgalan/32/head 2025-12-04T08:57:43.9784180Z * [new branch] gh/tugsbayasgalan/32/orig -> origin/gh/tugsbayasgalan/32/orig 2025-12-04T08:57:43.9785935Z * [new branch] gh/tugsbayasgalan/35/base -> origin/gh/tugsbayasgalan/35/base 2025-12-04T08:57:43.9787100Z * [new branch] gh/tugsbayasgalan/35/head -> origin/gh/tugsbayasgalan/35/head 2025-12-04T08:57:43.9788208Z * [new branch] gh/tugsbayasgalan/35/orig -> origin/gh/tugsbayasgalan/35/orig 2025-12-04T08:57:43.9789836Z * [new branch] gh/tugsbayasgalan/36/base -> origin/gh/tugsbayasgalan/36/base 2025-12-04T08:57:43.9790861Z * [new branch] gh/tugsbayasgalan/36/head -> origin/gh/tugsbayasgalan/36/head 2025-12-04T08:57:43.9791992Z * [new branch] gh/tugsbayasgalan/36/orig -> origin/gh/tugsbayasgalan/36/orig 2025-12-04T08:57:43.9793490Z * [new branch] gh/tugsbayasgalan/37/base -> origin/gh/tugsbayasgalan/37/base 2025-12-04T08:57:43.9794498Z * [new branch] gh/tugsbayasgalan/37/head -> origin/gh/tugsbayasgalan/37/head 2025-12-04T08:57:43.9795588Z * [new branch] gh/tugsbayasgalan/37/orig -> origin/gh/tugsbayasgalan/37/orig 2025-12-04T08:57:43.9797077Z * [new branch] gh/tugsbayasgalan/43/base -> origin/gh/tugsbayasgalan/43/base 2025-12-04T08:57:43.9798729Z * [new branch] gh/tugsbayasgalan/43/head -> origin/gh/tugsbayasgalan/43/head 2025-12-04T08:57:43.9799757Z * [new branch] gh/tugsbayasgalan/43/orig -> origin/gh/tugsbayasgalan/43/orig 2025-12-04T08:57:43.9801250Z * [new branch] gh/tugsbayasgalan/48/base -> origin/gh/tugsbayasgalan/48/base 2025-12-04T08:57:43.9802287Z * [new branch] gh/tugsbayasgalan/48/head -> origin/gh/tugsbayasgalan/48/head 2025-12-04T08:57:43.9803367Z * [new branch] gh/tugsbayasgalan/48/orig -> origin/gh/tugsbayasgalan/48/orig 2025-12-04T08:57:43.9804937Z * [new branch] gh/tugsbayasgalan/51/base -> origin/gh/tugsbayasgalan/51/base 2025-12-04T08:57:43.9805974Z * [new branch] gh/tugsbayasgalan/51/head -> origin/gh/tugsbayasgalan/51/head 2025-12-04T08:57:43.9807123Z * [new branch] gh/tugsbayasgalan/51/orig -> origin/gh/tugsbayasgalan/51/orig 2025-12-04T08:57:43.9808838Z * [new branch] gh/tugsbayasgalan/52/base -> origin/gh/tugsbayasgalan/52/base 2025-12-04T08:57:43.9809881Z * [new branch] gh/tugsbayasgalan/52/head -> origin/gh/tugsbayasgalan/52/head 2025-12-04T08:57:43.9811487Z * [new branch] gh/tugsbayasgalan/52/orig -> origin/gh/tugsbayasgalan/52/orig 2025-12-04T08:57:43.9812973Z * [new branch] gh/tugsbayasgalan/53/base -> origin/gh/tugsbayasgalan/53/base 2025-12-04T08:57:43.9814012Z * [new branch] gh/tugsbayasgalan/53/head -> origin/gh/tugsbayasgalan/53/head 2025-12-04T08:57:43.9815074Z * [new branch] gh/tugsbayasgalan/53/orig -> origin/gh/tugsbayasgalan/53/orig 2025-12-04T08:57:43.9817125Z * [new branch] gh/tugsbayasgalan/55/base -> origin/gh/tugsbayasgalan/55/base 2025-12-04T08:57:43.9818421Z * [new branch] gh/tugsbayasgalan/55/head -> origin/gh/tugsbayasgalan/55/head 2025-12-04T08:57:43.9819631Z * [new branch] gh/tugsbayasgalan/55/orig -> origin/gh/tugsbayasgalan/55/orig 2025-12-04T08:57:43.9823835Z * [new branch] gh/tugsbayasgalan/59/base -> origin/gh/tugsbayasgalan/59/base 2025-12-04T08:57:43.9825185Z * [new branch] gh/tugsbayasgalan/59/head -> origin/gh/tugsbayasgalan/59/head 2025-12-04T08:57:43.9826340Z * [new branch] gh/tugsbayasgalan/59/orig -> origin/gh/tugsbayasgalan/59/orig 2025-12-04T08:57:43.9827931Z * [new branch] gh/tugsbayasgalan/6/base -> origin/gh/tugsbayasgalan/6/base 2025-12-04T08:57:43.9828968Z * [new branch] gh/tugsbayasgalan/6/head -> origin/gh/tugsbayasgalan/6/head 2025-12-04T08:57:43.9830117Z * [new branch] gh/tugsbayasgalan/6/orig -> origin/gh/tugsbayasgalan/6/orig 2025-12-04T08:57:43.9832044Z * [new branch] gh/tugsbayasgalan/60/base -> origin/gh/tugsbayasgalan/60/base 2025-12-04T08:57:43.9833257Z * [new branch] gh/tugsbayasgalan/60/head -> origin/gh/tugsbayasgalan/60/head 2025-12-04T08:57:43.9834360Z * [new branch] gh/tugsbayasgalan/60/orig -> origin/gh/tugsbayasgalan/60/orig 2025-12-04T08:57:43.9836395Z * [new branch] gh/tugsbayasgalan/61/base -> origin/gh/tugsbayasgalan/61/base 2025-12-04T08:57:43.9839043Z * [new branch] gh/tugsbayasgalan/61/head -> origin/gh/tugsbayasgalan/61/head 2025-12-04T08:57:43.9839748Z * [new branch] gh/tugsbayasgalan/61/orig -> origin/gh/tugsbayasgalan/61/orig 2025-12-04T08:57:43.9840706Z * [new branch] gh/tugsbayasgalan/63/base -> origin/gh/tugsbayasgalan/63/base 2025-12-04T08:57:43.9841772Z * [new branch] gh/tugsbayasgalan/63/head -> origin/gh/tugsbayasgalan/63/head 2025-12-04T08:57:43.9842880Z * [new branch] gh/tugsbayasgalan/63/orig -> origin/gh/tugsbayasgalan/63/orig 2025-12-04T08:57:43.9844379Z * [new branch] gh/tugsbayasgalan/67/base -> origin/gh/tugsbayasgalan/67/base 2025-12-04T08:57:43.9845469Z * [new branch] gh/tugsbayasgalan/67/head -> origin/gh/tugsbayasgalan/67/head 2025-12-04T08:57:43.9846535Z * [new branch] gh/tugsbayasgalan/67/orig -> origin/gh/tugsbayasgalan/67/orig 2025-12-04T08:57:43.9848297Z * [new branch] gh/tugsbayasgalan/68/base -> origin/gh/tugsbayasgalan/68/base 2025-12-04T08:57:43.9849348Z * [new branch] gh/tugsbayasgalan/68/head -> origin/gh/tugsbayasgalan/68/head 2025-12-04T08:57:43.9850470Z * [new branch] gh/tugsbayasgalan/68/orig -> origin/gh/tugsbayasgalan/68/orig 2025-12-04T08:57:43.9852158Z * [new branch] gh/tugsbayasgalan/7/base -> origin/gh/tugsbayasgalan/7/base 2025-12-04T08:57:43.9853173Z * [new branch] gh/tugsbayasgalan/7/head -> origin/gh/tugsbayasgalan/7/head 2025-12-04T08:57:43.9854758Z * [new branch] gh/tugsbayasgalan/7/orig -> origin/gh/tugsbayasgalan/7/orig 2025-12-04T08:57:43.9857001Z * [new branch] gh/tugsbayasgalan/70/base -> origin/gh/tugsbayasgalan/70/base 2025-12-04T08:57:43.9858253Z * [new branch] gh/tugsbayasgalan/70/head -> origin/gh/tugsbayasgalan/70/head 2025-12-04T08:57:43.9859403Z * [new branch] gh/tugsbayasgalan/70/orig -> origin/gh/tugsbayasgalan/70/orig 2025-12-04T08:57:43.9861142Z * [new branch] gh/tugsbayasgalan/71/base -> origin/gh/tugsbayasgalan/71/base 2025-12-04T08:57:43.9862416Z * [new branch] gh/tugsbayasgalan/71/head -> origin/gh/tugsbayasgalan/71/head 2025-12-04T08:57:43.9863610Z * [new branch] gh/tugsbayasgalan/71/orig -> origin/gh/tugsbayasgalan/71/orig 2025-12-04T08:57:43.9865420Z * [new branch] gh/tugsbayasgalan/72/base -> origin/gh/tugsbayasgalan/72/base 2025-12-04T08:57:43.9866534Z * [new branch] gh/tugsbayasgalan/72/head -> origin/gh/tugsbayasgalan/72/head 2025-12-04T08:57:43.9867669Z * [new branch] gh/tugsbayasgalan/72/orig -> origin/gh/tugsbayasgalan/72/orig 2025-12-04T08:57:43.9869498Z * [new branch] gh/tugsbayasgalan/73/base -> origin/gh/tugsbayasgalan/73/base 2025-12-04T08:57:43.9870620Z * [new branch] gh/tugsbayasgalan/73/head -> origin/gh/tugsbayasgalan/73/head 2025-12-04T08:57:43.9871724Z * [new branch] gh/tugsbayasgalan/73/orig -> origin/gh/tugsbayasgalan/73/orig 2025-12-04T08:57:43.9873467Z * [new branch] gh/tugsbayasgalan/74/base -> origin/gh/tugsbayasgalan/74/base 2025-12-04T08:57:43.9874615Z * [new branch] gh/tugsbayasgalan/74/head -> origin/gh/tugsbayasgalan/74/head 2025-12-04T08:57:43.9875719Z * [new branch] gh/tugsbayasgalan/74/orig -> origin/gh/tugsbayasgalan/74/orig 2025-12-04T08:57:43.9877701Z * [new branch] gh/tugsbayasgalan/75/base -> origin/gh/tugsbayasgalan/75/base 2025-12-04T08:57:43.9878750Z * [new branch] gh/tugsbayasgalan/75/head -> origin/gh/tugsbayasgalan/75/head 2025-12-04T08:57:43.9879920Z * [new branch] gh/tugsbayasgalan/75/orig -> origin/gh/tugsbayasgalan/75/orig 2025-12-04T08:57:43.9881314Z * [new branch] gh/tugsbayasgalan/76/base -> origin/gh/tugsbayasgalan/76/base 2025-12-04T08:57:43.9882377Z * [new branch] gh/tugsbayasgalan/76/head -> origin/gh/tugsbayasgalan/76/head 2025-12-04T08:57:43.9883453Z * [new branch] gh/tugsbayasgalan/76/orig -> origin/gh/tugsbayasgalan/76/orig 2025-12-04T08:57:43.9885278Z * [new branch] gh/tugsbayasgalan/77/base -> origin/gh/tugsbayasgalan/77/base 2025-12-04T08:57:43.9886259Z * [new branch] gh/tugsbayasgalan/77/head -> origin/gh/tugsbayasgalan/77/head 2025-12-04T08:57:43.9887293Z * [new branch] gh/tugsbayasgalan/77/orig -> origin/gh/tugsbayasgalan/77/orig 2025-12-04T08:57:43.9888911Z * [new branch] gh/tugsbayasgalan/78/base -> origin/gh/tugsbayasgalan/78/base 2025-12-04T08:57:43.9890134Z * [new branch] gh/tugsbayasgalan/78/head -> origin/gh/tugsbayasgalan/78/head 2025-12-04T08:57:43.9891200Z * [new branch] gh/tugsbayasgalan/78/orig -> origin/gh/tugsbayasgalan/78/orig 2025-12-04T08:57:43.9892770Z * [new branch] gh/tugsbayasgalan/79/base -> origin/gh/tugsbayasgalan/79/base 2025-12-04T08:57:43.9893804Z * [new branch] gh/tugsbayasgalan/79/head -> origin/gh/tugsbayasgalan/79/head 2025-12-04T08:57:43.9895472Z * [new branch] gh/tugsbayasgalan/79/orig -> origin/gh/tugsbayasgalan/79/orig 2025-12-04T08:57:43.9897296Z * [new branch] gh/tugsbayasgalan/8/base -> origin/gh/tugsbayasgalan/8/base 2025-12-04T08:57:43.9898323Z * [new branch] gh/tugsbayasgalan/8/head -> origin/gh/tugsbayasgalan/8/head 2025-12-04T08:57:43.9899452Z * [new branch] gh/tugsbayasgalan/8/orig -> origin/gh/tugsbayasgalan/8/orig 2025-12-04T08:57:43.9901129Z * [new branch] gh/tugsbayasgalan/80/base -> origin/gh/tugsbayasgalan/80/base 2025-12-04T08:57:43.9902024Z * [new branch] gh/tugsbayasgalan/80/head -> origin/gh/tugsbayasgalan/80/head 2025-12-04T08:57:43.9903156Z * [new branch] gh/tugsbayasgalan/80/orig -> origin/gh/tugsbayasgalan/80/orig 2025-12-04T08:57:43.9904803Z * [new branch] gh/tugsbayasgalan/81/base -> origin/gh/tugsbayasgalan/81/base 2025-12-04T08:57:43.9905784Z * [new branch] gh/tugsbayasgalan/81/head -> origin/gh/tugsbayasgalan/81/head 2025-12-04T08:57:43.9907112Z * [new branch] gh/tugsbayasgalan/81/orig -> origin/gh/tugsbayasgalan/81/orig 2025-12-04T08:57:43.9909706Z * [new branch] gh/tugsbayasgalan/82/base -> origin/gh/tugsbayasgalan/82/base 2025-12-04T08:57:43.9910947Z * [new branch] gh/tugsbayasgalan/82/head -> origin/gh/tugsbayasgalan/82/head 2025-12-04T08:57:43.9912059Z * [new branch] gh/tugsbayasgalan/82/orig -> origin/gh/tugsbayasgalan/82/orig 2025-12-04T08:57:43.9913539Z * [new branch] gh/tugsbayasgalan/83/base -> origin/gh/tugsbayasgalan/83/base 2025-12-04T08:57:43.9914628Z * [new branch] gh/tugsbayasgalan/83/head -> origin/gh/tugsbayasgalan/83/head 2025-12-04T08:57:43.9915722Z * [new branch] gh/tugsbayasgalan/83/orig -> origin/gh/tugsbayasgalan/83/orig 2025-12-04T08:57:43.9917233Z * [new branch] gh/tugsbayasgalan/84/base -> origin/gh/tugsbayasgalan/84/base 2025-12-04T08:57:43.9918241Z * [new branch] gh/tugsbayasgalan/84/head -> origin/gh/tugsbayasgalan/84/head 2025-12-04T08:57:43.9919301Z * [new branch] gh/tugsbayasgalan/84/orig -> origin/gh/tugsbayasgalan/84/orig 2025-12-04T08:57:43.9920677Z * [new branch] gh/tugsbayasgalan/85/base -> origin/gh/tugsbayasgalan/85/base 2025-12-04T08:57:43.9922359Z * [new branch] gh/tugsbayasgalan/85/head -> origin/gh/tugsbayasgalan/85/head 2025-12-04T08:57:43.9923415Z * [new branch] gh/tugsbayasgalan/85/orig -> origin/gh/tugsbayasgalan/85/orig 2025-12-04T08:57:43.9925043Z * [new branch] gh/tugsbayasgalan/86/base -> origin/gh/tugsbayasgalan/86/base 2025-12-04T08:57:43.9926182Z * [new branch] gh/tugsbayasgalan/86/head -> origin/gh/tugsbayasgalan/86/head 2025-12-04T08:57:43.9927321Z * [new branch] gh/tugsbayasgalan/86/orig -> origin/gh/tugsbayasgalan/86/orig 2025-12-04T08:57:43.9929216Z * [new branch] gh/tugsbayasgalan/87/base -> origin/gh/tugsbayasgalan/87/base 2025-12-04T08:57:43.9930349Z * [new branch] gh/tugsbayasgalan/87/head -> origin/gh/tugsbayasgalan/87/head 2025-12-04T08:57:43.9931467Z * [new branch] gh/tugsbayasgalan/87/orig -> origin/gh/tugsbayasgalan/87/orig 2025-12-04T08:57:43.9933396Z * [new branch] gh/tugsbayasgalan/88/base -> origin/gh/tugsbayasgalan/88/base 2025-12-04T08:57:43.9934413Z * [new branch] gh/tugsbayasgalan/88/head -> origin/gh/tugsbayasgalan/88/head 2025-12-04T08:57:43.9935516Z * [new branch] gh/tugsbayasgalan/88/orig -> origin/gh/tugsbayasgalan/88/orig 2025-12-04T08:57:43.9937425Z * [new branch] gh/tugsbayasgalan/89/base -> origin/gh/tugsbayasgalan/89/base 2025-12-04T08:57:43.9938505Z * [new branch] gh/tugsbayasgalan/89/head -> origin/gh/tugsbayasgalan/89/head 2025-12-04T08:57:43.9940107Z * [new branch] gh/tugsbayasgalan/89/orig -> origin/gh/tugsbayasgalan/89/orig 2025-12-04T08:57:43.9942107Z * [new branch] gh/tugsbayasgalan/9/base -> origin/gh/tugsbayasgalan/9/base 2025-12-04T08:57:43.9943077Z * [new branch] gh/tugsbayasgalan/9/head -> origin/gh/tugsbayasgalan/9/head 2025-12-04T08:57:43.9944213Z * [new branch] gh/tugsbayasgalan/9/orig -> origin/gh/tugsbayasgalan/9/orig 2025-12-04T08:57:43.9945990Z * [new branch] gh/tugsbayasgalan/90/base -> origin/gh/tugsbayasgalan/90/base 2025-12-04T08:57:43.9947147Z * [new branch] gh/tugsbayasgalan/90/head -> origin/gh/tugsbayasgalan/90/head 2025-12-04T08:57:43.9948135Z * [new branch] gh/tugsbayasgalan/90/orig -> origin/gh/tugsbayasgalan/90/orig 2025-12-04T08:57:43.9949995Z * [new branch] gh/tugsbayasgalan/91/base -> origin/gh/tugsbayasgalan/91/base 2025-12-04T08:57:43.9951055Z * [new branch] gh/tugsbayasgalan/91/head -> origin/gh/tugsbayasgalan/91/head 2025-12-04T08:57:43.9952092Z * [new branch] gh/tugsbayasgalan/91/orig -> origin/gh/tugsbayasgalan/91/orig 2025-12-04T08:57:43.9953748Z * [new branch] gh/tugsbayasgalan/92/base -> origin/gh/tugsbayasgalan/92/base 2025-12-04T08:57:43.9954765Z * [new branch] gh/tugsbayasgalan/92/head -> origin/gh/tugsbayasgalan/92/head 2025-12-04T08:57:43.9955874Z * [new branch] gh/tugsbayasgalan/92/orig -> origin/gh/tugsbayasgalan/92/orig 2025-12-04T08:57:43.9958005Z * [new branch] gh/tugsbayasgalan/93/base -> origin/gh/tugsbayasgalan/93/base 2025-12-04T08:57:43.9959141Z * [new branch] gh/tugsbayasgalan/93/head -> origin/gh/tugsbayasgalan/93/head 2025-12-04T08:57:43.9960234Z * [new branch] gh/tugsbayasgalan/93/orig -> origin/gh/tugsbayasgalan/93/orig 2025-12-04T08:57:43.9962081Z * [new branch] gh/v0i0/14/base -> origin/gh/v0i0/14/base 2025-12-04T08:57:43.9963083Z * [new branch] gh/v0i0/14/head -> origin/gh/v0i0/14/head 2025-12-04T08:57:43.9964166Z * [new branch] gh/v0i0/14/orig -> origin/gh/v0i0/14/orig 2025-12-04T08:57:43.9965646Z * [new branch] gh/v0i0/15/base -> origin/gh/v0i0/15/base 2025-12-04T08:57:43.9966742Z * [new branch] gh/v0i0/15/head -> origin/gh/v0i0/15/head 2025-12-04T08:57:43.9967823Z * [new branch] gh/v0i0/15/orig -> origin/gh/v0i0/15/orig 2025-12-04T08:57:43.9969378Z * [new branch] gh/v0i0/16/base -> origin/gh/v0i0/16/base 2025-12-04T08:57:43.9970411Z * [new branch] gh/v0i0/16/head -> origin/gh/v0i0/16/head 2025-12-04T08:57:43.9971491Z * [new branch] gh/v0i0/16/orig -> origin/gh/v0i0/16/orig 2025-12-04T08:57:43.9972994Z * [new branch] gh/v0i0/17/base -> origin/gh/v0i0/17/base 2025-12-04T08:57:43.9974061Z * [new branch] gh/v0i0/17/head -> origin/gh/v0i0/17/head 2025-12-04T08:57:43.9975310Z * [new branch] gh/v0i0/17/orig -> origin/gh/v0i0/17/orig 2025-12-04T08:57:43.9977164Z * [new branch] gh/v0i0/18/base -> origin/gh/v0i0/18/base 2025-12-04T08:57:43.9978481Z * [new branch] gh/v0i0/18/head -> origin/gh/v0i0/18/head 2025-12-04T08:57:43.9979525Z * [new branch] gh/v0i0/18/orig -> origin/gh/v0i0/18/orig 2025-12-04T08:57:43.9981172Z * [new branch] gh/v0i0/19/base -> origin/gh/v0i0/19/base 2025-12-04T08:57:43.9982277Z * [new branch] gh/v0i0/19/head -> origin/gh/v0i0/19/head 2025-12-04T08:57:43.9983410Z * [new branch] gh/v0i0/19/orig -> origin/gh/v0i0/19/orig 2025-12-04T08:57:43.9985343Z * [new branch] gh/vishal9-team/1/base -> origin/gh/vishal9-team/1/base 2025-12-04T08:57:43.9986442Z * [new branch] gh/vishal9-team/1/head -> origin/gh/vishal9-team/1/head 2025-12-04T08:57:43.9988349Z * [new branch] gh/vishal9-team/2/base -> origin/gh/vishal9-team/2/base 2025-12-04T08:57:43.9989504Z * [new branch] gh/vishal9-team/2/head -> origin/gh/vishal9-team/2/head 2025-12-04T08:57:43.9990643Z * [new branch] gh/vishal9-team/2/orig -> origin/gh/vishal9-team/2/orig 2025-12-04T08:57:43.9992136Z * [new branch] gh/vishal9-team/3/base -> origin/gh/vishal9-team/3/base 2025-12-04T08:57:43.9993371Z * [new branch] gh/vishal9-team/3/head -> origin/gh/vishal9-team/3/head 2025-12-04T08:57:43.9994398Z * [new branch] gh/vishal9-team/3/orig -> origin/gh/vishal9-team/3/orig 2025-12-04T08:57:43.9995884Z * [new branch] gh/vishal9-team/4/base -> origin/gh/vishal9-team/4/base 2025-12-04T08:57:43.9996888Z * [new branch] gh/vishal9-team/4/head -> origin/gh/vishal9-team/4/head 2025-12-04T08:57:43.9997948Z * [new branch] gh/vishal9-team/4/orig -> origin/gh/vishal9-team/4/orig 2025-12-04T08:57:43.9999722Z * [new branch] gh/vkuzo/1/next -> origin/gh/vkuzo/1/next 2025-12-04T08:57:44.0001140Z * [new branch] gh/vkuzo/2/next -> origin/gh/vkuzo/2/next 2025-12-04T08:57:44.0002576Z * [new branch] gh/vkuzo/3/next -> origin/gh/vkuzo/3/next 2025-12-04T08:57:44.0004291Z * [new branch] gh/wconstab/424/base -> origin/gh/wconstab/424/base 2025-12-04T08:57:44.0005416Z * [new branch] gh/wconstab/424/head -> origin/gh/wconstab/424/head 2025-12-04T08:57:44.0006557Z * [new branch] gh/wconstab/424/orig -> origin/gh/wconstab/424/orig 2025-12-04T08:57:44.0008170Z * [new branch] gh/wconstab/435/base -> origin/gh/wconstab/435/base 2025-12-04T08:57:44.0009289Z * [new branch] gh/wconstab/435/head -> origin/gh/wconstab/435/head 2025-12-04T08:57:44.0011029Z * [new branch] gh/wconstab/435/orig -> origin/gh/wconstab/435/orig 2025-12-04T08:57:44.0012433Z * [new branch] gh/wconstab/444/base -> origin/gh/wconstab/444/base 2025-12-04T08:57:44.0013511Z * [new branch] gh/wconstab/444/head -> origin/gh/wconstab/444/head 2025-12-04T08:57:44.0014602Z * [new branch] gh/wconstab/444/orig -> origin/gh/wconstab/444/orig 2025-12-04T08:57:44.0016135Z * [new branch] gh/wconstab/447/base -> origin/gh/wconstab/447/base 2025-12-04T08:57:44.0017504Z * [new branch] gh/wconstab/447/head -> origin/gh/wconstab/447/head 2025-12-04T08:57:44.0018625Z * [new branch] gh/wconstab/447/orig -> origin/gh/wconstab/447/orig 2025-12-04T08:57:44.0020182Z * [new branch] gh/wconstab/448/base -> origin/gh/wconstab/448/base 2025-12-04T08:57:44.0021524Z * [new branch] gh/wconstab/448/head -> origin/gh/wconstab/448/head 2025-12-04T08:57:44.0022703Z * [new branch] gh/wconstab/448/orig -> origin/gh/wconstab/448/orig 2025-12-04T08:57:44.0024230Z * [new branch] gh/wconstab/449/base -> origin/gh/wconstab/449/base 2025-12-04T08:57:44.0025339Z * [new branch] gh/wconstab/449/head -> origin/gh/wconstab/449/head 2025-12-04T08:57:44.0026546Z * [new branch] gh/wconstab/449/orig -> origin/gh/wconstab/449/orig 2025-12-04T08:57:44.0027899Z * [new branch] gh/wconstab/450/base -> origin/gh/wconstab/450/base 2025-12-04T08:57:44.0029038Z * [new branch] gh/wconstab/450/head -> origin/gh/wconstab/450/head 2025-12-04T08:57:44.0030188Z * [new branch] gh/wconstab/450/orig -> origin/gh/wconstab/450/orig 2025-12-04T08:57:44.0031599Z * [new branch] gh/wconstab/451/base -> origin/gh/wconstab/451/base 2025-12-04T08:57:44.0032882Z * [new branch] gh/wconstab/451/head -> origin/gh/wconstab/451/head 2025-12-04T08:57:44.0033996Z * [new branch] gh/wconstab/451/orig -> origin/gh/wconstab/451/orig 2025-12-04T08:57:44.0035592Z * [new branch] gh/wconstab/452/base -> origin/gh/wconstab/452/base 2025-12-04T08:57:44.0036582Z * [new branch] gh/wconstab/452/head -> origin/gh/wconstab/452/head 2025-12-04T08:57:44.0037615Z * [new branch] gh/wconstab/452/orig -> origin/gh/wconstab/452/orig 2025-12-04T08:57:44.0039205Z * [new branch] gh/wconstab/453/base -> origin/gh/wconstab/453/base 2025-12-04T08:57:44.0040222Z * [new branch] gh/wconstab/453/head -> origin/gh/wconstab/453/head 2025-12-04T08:57:44.0041405Z * [new branch] gh/wconstab/453/orig -> origin/gh/wconstab/453/orig 2025-12-04T08:57:44.0042904Z * [new branch] gh/wconstab/454/base -> origin/gh/wconstab/454/base 2025-12-04T08:57:44.0043916Z * [new branch] gh/wconstab/454/head -> origin/gh/wconstab/454/head 2025-12-04T08:57:44.0045017Z * [new branch] gh/wconstab/454/orig -> origin/gh/wconstab/454/orig 2025-12-04T08:57:44.0046545Z * [new branch] gh/wconstab/455/base -> origin/gh/wconstab/455/base 2025-12-04T08:57:44.0047606Z * [new branch] gh/wconstab/455/head -> origin/gh/wconstab/455/head 2025-12-04T08:57:44.0048721Z * [new branch] gh/wconstab/455/orig -> origin/gh/wconstab/455/orig 2025-12-04T08:57:44.0050949Z * [new branch] gh/wconstab/456/base -> origin/gh/wconstab/456/base 2025-12-04T08:57:44.0052337Z * [new branch] gh/wconstab/456/head -> origin/gh/wconstab/456/head 2025-12-04T08:57:44.0053567Z * [new branch] gh/wconstab/456/orig -> origin/gh/wconstab/456/orig 2025-12-04T08:57:44.0055172Z * [new branch] gh/wconstab/457/base -> origin/gh/wconstab/457/base 2025-12-04T08:57:44.0056200Z * [new branch] gh/wconstab/457/head -> origin/gh/wconstab/457/head 2025-12-04T08:57:44.0057826Z * [new branch] gh/wconstab/457/orig -> origin/gh/wconstab/457/orig 2025-12-04T08:57:44.0059386Z * [new branch] gh/wconstab/458/base -> origin/gh/wconstab/458/base 2025-12-04T08:57:44.0060521Z * [new branch] gh/wconstab/458/head -> origin/gh/wconstab/458/head 2025-12-04T08:57:44.0061621Z * [new branch] gh/wconstab/458/orig -> origin/gh/wconstab/458/orig 2025-12-04T08:57:44.0063080Z * [new branch] gh/wconstab/459/base -> origin/gh/wconstab/459/base 2025-12-04T08:57:44.0064197Z * [new branch] gh/wconstab/459/head -> origin/gh/wconstab/459/head 2025-12-04T08:57:44.0065746Z * [new branch] gh/wconstab/459/orig -> origin/gh/wconstab/459/orig 2025-12-04T08:57:44.0067854Z * [new branch] gh/wconstab/460/base -> origin/gh/wconstab/460/base 2025-12-04T08:57:44.0069387Z * [new branch] gh/wconstab/460/head -> origin/gh/wconstab/460/head 2025-12-04T08:57:44.0070534Z * [new branch] gh/wconstab/460/orig -> origin/gh/wconstab/460/orig 2025-12-04T08:57:44.0072365Z * [new branch] gh/wconstab/461/base -> origin/gh/wconstab/461/base 2025-12-04T08:57:44.0073387Z * [new branch] gh/wconstab/461/head -> origin/gh/wconstab/461/head 2025-12-04T08:57:44.0074471Z * [new branch] gh/wconstab/461/orig -> origin/gh/wconstab/461/orig 2025-12-04T08:57:44.0075863Z * [new branch] gh/wconstab/462/base -> origin/gh/wconstab/462/base 2025-12-04T08:57:44.0077025Z * [new branch] gh/wconstab/462/head -> origin/gh/wconstab/462/head 2025-12-04T08:57:44.0078204Z * [new branch] gh/wconstab/462/orig -> origin/gh/wconstab/462/orig 2025-12-04T08:57:44.0079800Z * [new branch] gh/wconstab/463/base -> origin/gh/wconstab/463/base 2025-12-04T08:57:44.0080892Z * [new branch] gh/wconstab/463/head -> origin/gh/wconstab/463/head 2025-12-04T08:57:44.0082036Z * [new branch] gh/wconstab/463/orig -> origin/gh/wconstab/463/orig 2025-12-04T08:57:44.0083601Z * [new branch] gh/wconstab/464/base -> origin/gh/wconstab/464/base 2025-12-04T08:57:44.0084647Z * [new branch] gh/wconstab/464/head -> origin/gh/wconstab/464/head 2025-12-04T08:57:44.0085873Z * [new branch] gh/wconstab/464/orig -> origin/gh/wconstab/464/orig 2025-12-04T08:57:44.0087345Z * [new branch] gh/wconstab/465/base -> origin/gh/wconstab/465/base 2025-12-04T08:57:44.0088515Z * [new branch] gh/wconstab/465/head -> origin/gh/wconstab/465/head 2025-12-04T08:57:44.0108375Z * [new branch] gh/wconstab/465/orig -> origin/gh/wconstab/465/orig 2025-12-04T08:57:44.0109278Z * [new branch] gh/wconstab/466/base -> origin/gh/wconstab/466/base 2025-12-04T08:57:44.0109921Z * [new branch] gh/wconstab/466/head -> origin/gh/wconstab/466/head 2025-12-04T08:57:44.0110552Z * [new branch] gh/wconstab/466/orig -> origin/gh/wconstab/466/orig 2025-12-04T08:57:44.0111183Z * [new branch] gh/wconstab/467/base -> origin/gh/wconstab/467/base 2025-12-04T08:57:44.0111794Z * [new branch] gh/wconstab/467/head -> origin/gh/wconstab/467/head 2025-12-04T08:57:44.0112427Z * [new branch] gh/wconstab/467/orig -> origin/gh/wconstab/467/orig 2025-12-04T08:57:44.0113057Z * [new branch] gh/wconstab/468/base -> origin/gh/wconstab/468/base 2025-12-04T08:57:44.0113667Z * [new branch] gh/wconstab/468/head -> origin/gh/wconstab/468/head 2025-12-04T08:57:44.0114285Z * [new branch] gh/wconstab/468/orig -> origin/gh/wconstab/468/orig 2025-12-04T08:57:44.0114917Z * [new branch] gh/weifengpy/39/base -> origin/gh/weifengpy/39/base 2025-12-04T08:57:44.0115549Z * [new branch] gh/weifengpy/39/head -> origin/gh/weifengpy/39/head 2025-12-04T08:57:44.0116168Z * [new branch] gh/weifengpy/39/orig -> origin/gh/weifengpy/39/orig 2025-12-04T08:57:44.0116798Z * [new branch] gh/weifengpy/40/base -> origin/gh/weifengpy/40/base 2025-12-04T08:57:44.0117425Z * [new branch] gh/weifengpy/40/head -> origin/gh/weifengpy/40/head 2025-12-04T08:57:44.0118066Z * [new branch] gh/weifengpy/40/orig -> origin/gh/weifengpy/40/orig 2025-12-04T08:57:44.0118688Z * [new branch] gh/weifengpy/41/base -> origin/gh/weifengpy/41/base 2025-12-04T08:57:44.0119320Z * [new branch] gh/weifengpy/41/head -> origin/gh/weifengpy/41/head 2025-12-04T08:57:44.0119947Z * [new branch] gh/weifengpy/41/orig -> origin/gh/weifengpy/41/orig 2025-12-04T08:57:44.0120606Z * [new branch] gh/williamwen42/250/base -> origin/gh/williamwen42/250/base 2025-12-04T08:57:44.0121641Z * [new branch] gh/williamwen42/250/head -> origin/gh/williamwen42/250/head 2025-12-04T08:57:44.0122334Z * [new branch] gh/williamwen42/250/orig -> origin/gh/williamwen42/250/orig 2025-12-04T08:57:44.0123027Z * [new branch] gh/williamwen42/279/base -> origin/gh/williamwen42/279/base 2025-12-04T08:57:44.0123718Z * [new branch] gh/williamwen42/279/head -> origin/gh/williamwen42/279/head 2025-12-04T08:57:44.0124397Z * [new branch] gh/williamwen42/279/orig -> origin/gh/williamwen42/279/orig 2025-12-04T08:57:44.0125095Z * [new branch] gh/williamwen42/282/base -> origin/gh/williamwen42/282/base 2025-12-04T08:57:44.0125782Z * [new branch] gh/williamwen42/282/head -> origin/gh/williamwen42/282/head 2025-12-04T08:57:44.0126471Z * [new branch] gh/williamwen42/282/orig -> origin/gh/williamwen42/282/orig 2025-12-04T08:57:44.0128080Z * [new branch] gh/williamwen42/287/base -> origin/gh/williamwen42/287/base 2025-12-04T08:57:44.0129106Z * [new branch] gh/williamwen42/287/head -> origin/gh/williamwen42/287/head 2025-12-04T08:57:44.0130265Z * [new branch] gh/williamwen42/287/orig -> origin/gh/williamwen42/287/orig 2025-12-04T08:57:44.0131908Z * [new branch] gh/williamwen42/288/base -> origin/gh/williamwen42/288/base 2025-12-04T08:57:44.0133255Z * [new branch] gh/williamwen42/288/head -> origin/gh/williamwen42/288/head 2025-12-04T08:57:44.0134289Z * [new branch] gh/williamwen42/288/orig -> origin/gh/williamwen42/288/orig 2025-12-04T08:57:44.0136015Z * [new branch] gh/williamwen42/296/base -> origin/gh/williamwen42/296/base 2025-12-04T08:57:44.0137716Z * [new branch] gh/williamwen42/296/head -> origin/gh/williamwen42/296/head 2025-12-04T08:57:44.0138748Z * [new branch] gh/williamwen42/296/orig -> origin/gh/williamwen42/296/orig 2025-12-04T08:57:44.0140235Z * [new branch] gh/williamwen42/297/base -> origin/gh/williamwen42/297/base 2025-12-04T08:57:44.0141297Z * [new branch] gh/williamwen42/297/head -> origin/gh/williamwen42/297/head 2025-12-04T08:57:44.0142440Z * [new branch] gh/williamwen42/297/orig -> origin/gh/williamwen42/297/orig 2025-12-04T08:57:44.0144017Z * [new branch] gh/williamwen42/306/base -> origin/gh/williamwen42/306/base 2025-12-04T08:57:44.0145154Z * [new branch] gh/williamwen42/306/head -> origin/gh/williamwen42/306/head 2025-12-04T08:57:44.0146332Z * [new branch] gh/williamwen42/306/orig -> origin/gh/williamwen42/306/orig 2025-12-04T08:57:44.0147964Z * [new branch] gh/williamwen42/309/base -> origin/gh/williamwen42/309/base 2025-12-04T08:57:44.0149312Z * [new branch] gh/williamwen42/309/head -> origin/gh/williamwen42/309/head 2025-12-04T08:57:44.0150396Z * [new branch] gh/williamwen42/309/orig -> origin/gh/williamwen42/309/orig 2025-12-04T08:57:44.0151940Z * [new branch] gh/williamwen42/310/base -> origin/gh/williamwen42/310/base 2025-12-04T08:57:44.0152939Z * [new branch] gh/williamwen42/310/head -> origin/gh/williamwen42/310/head 2025-12-04T08:57:44.0154042Z * [new branch] gh/williamwen42/310/orig -> origin/gh/williamwen42/310/orig 2025-12-04T08:57:44.0156983Z * [new branch] gh/williamwen42/311/base -> origin/gh/williamwen42/311/base 2025-12-04T08:57:44.0158024Z * [new branch] gh/williamwen42/311/head -> origin/gh/williamwen42/311/head 2025-12-04T08:57:44.0159085Z * [new branch] gh/williamwen42/311/orig -> origin/gh/williamwen42/311/orig 2025-12-04T08:57:44.0160477Z * [new branch] gh/williamwen42/319/base -> origin/gh/williamwen42/319/base 2025-12-04T08:57:44.0161535Z * [new branch] gh/williamwen42/319/head -> origin/gh/williamwen42/319/head 2025-12-04T08:57:44.0162616Z * [new branch] gh/williamwen42/319/orig -> origin/gh/williamwen42/319/orig 2025-12-04T08:57:44.0164198Z * [new branch] gh/williamwen42/325/base -> origin/gh/williamwen42/325/base 2025-12-04T08:57:44.0165823Z * [new branch] gh/williamwen42/325/head -> origin/gh/williamwen42/325/head 2025-12-04T08:57:44.0166638Z * [new branch] gh/williamwen42/325/orig -> origin/gh/williamwen42/325/orig 2025-12-04T08:57:44.0168113Z * [new branch] gh/williamwen42/326/base -> origin/gh/williamwen42/326/base 2025-12-04T08:57:44.0169244Z * [new branch] gh/williamwen42/326/head -> origin/gh/williamwen42/326/head 2025-12-04T08:57:44.0170391Z * [new branch] gh/williamwen42/326/orig -> origin/gh/williamwen42/326/orig 2025-12-04T08:57:44.0172405Z * [new branch] gh/williamwen42/327/base -> origin/gh/williamwen42/327/base 2025-12-04T08:57:44.0175505Z * [new branch] gh/williamwen42/327/head -> origin/gh/williamwen42/327/head 2025-12-04T08:57:44.0176441Z * [new branch] gh/williamwen42/327/orig -> origin/gh/williamwen42/327/orig 2025-12-04T08:57:44.0177300Z * [new branch] gh/williamwen42/328/base -> origin/gh/williamwen42/328/base 2025-12-04T08:57:44.0178222Z * [new branch] gh/williamwen42/328/head -> origin/gh/williamwen42/328/head 2025-12-04T08:57:44.0179237Z * [new branch] gh/williamwen42/328/orig -> origin/gh/williamwen42/328/orig 2025-12-04T08:57:44.0181118Z * [new branch] gh/williamwen42/329/base -> origin/gh/williamwen42/329/base 2025-12-04T08:57:44.0182418Z * [new branch] gh/williamwen42/329/head -> origin/gh/williamwen42/329/head 2025-12-04T08:57:44.0183624Z * [new branch] gh/williamwen42/329/orig -> origin/gh/williamwen42/329/orig 2025-12-04T08:57:44.0185219Z * [new branch] gh/williamwen42/330/base -> origin/gh/williamwen42/330/base 2025-12-04T08:57:44.0186368Z * [new branch] gh/williamwen42/330/head -> origin/gh/williamwen42/330/head 2025-12-04T08:57:44.0187533Z * [new branch] gh/williamwen42/330/orig -> origin/gh/williamwen42/330/orig 2025-12-04T08:57:44.0189169Z * [new branch] gh/williamwen42/331/base -> origin/gh/williamwen42/331/base 2025-12-04T08:57:44.0190208Z * [new branch] gh/williamwen42/331/head -> origin/gh/williamwen42/331/head 2025-12-04T08:57:44.0191268Z * [new branch] gh/williamwen42/331/orig -> origin/gh/williamwen42/331/orig 2025-12-04T08:57:44.0192676Z * [new branch] gh/williamwen42/332/base -> origin/gh/williamwen42/332/base 2025-12-04T08:57:44.0193711Z * [new branch] gh/williamwen42/332/head -> origin/gh/williamwen42/332/head 2025-12-04T08:57:44.0194816Z * [new branch] gh/williamwen42/332/orig -> origin/gh/williamwen42/332/orig 2025-12-04T08:57:44.0196572Z * [new branch] gh/williamwen42/333/base -> origin/gh/williamwen42/333/base 2025-12-04T08:57:44.0197672Z * [new branch] gh/williamwen42/333/head -> origin/gh/williamwen42/333/head 2025-12-04T08:57:44.0198745Z * [new branch] gh/williamwen42/333/orig -> origin/gh/williamwen42/333/orig 2025-12-04T08:57:44.0200754Z * [new branch] gh/williamwen42/334/base -> origin/gh/williamwen42/334/base 2025-12-04T08:57:44.0201846Z * [new branch] gh/williamwen42/334/head -> origin/gh/williamwen42/334/head 2025-12-04T08:57:44.0202983Z * [new branch] gh/williamwen42/334/orig -> origin/gh/williamwen42/334/orig 2025-12-04T08:57:44.0204713Z * [new branch] gh/williamwen42/335/base -> origin/gh/williamwen42/335/base 2025-12-04T08:57:44.0209520Z * [new branch] gh/williamwen42/335/head -> origin/gh/williamwen42/335/head 2025-12-04T08:57:44.0210628Z * [new branch] gh/williamwen42/335/orig -> origin/gh/williamwen42/335/orig 2025-12-04T08:57:44.0212287Z * [new branch] gh/williamwen42/336/base -> origin/gh/williamwen42/336/base 2025-12-04T08:57:44.0213298Z * [new branch] gh/williamwen42/336/head -> origin/gh/williamwen42/336/head 2025-12-04T08:57:44.0214324Z * [new branch] gh/williamwen42/336/orig -> origin/gh/williamwen42/336/orig 2025-12-04T08:57:44.0215904Z * [new branch] gh/williamwen42/337/base -> origin/gh/williamwen42/337/base 2025-12-04T08:57:44.0217422Z * [new branch] gh/williamwen42/337/head -> origin/gh/williamwen42/337/head 2025-12-04T08:57:44.0219054Z * [new branch] gh/williamwen42/337/orig -> origin/gh/williamwen42/337/orig 2025-12-04T08:57:44.0220694Z * [new branch] gh/williamwen42/338/base -> origin/gh/williamwen42/338/base 2025-12-04T08:57:44.0222062Z * [new branch] gh/williamwen42/338/head -> origin/gh/williamwen42/338/head 2025-12-04T08:57:44.0223174Z * [new branch] gh/williamwen42/338/orig -> origin/gh/williamwen42/338/orig 2025-12-04T08:57:44.0224744Z * [new branch] gh/williamwen42/339/base -> origin/gh/williamwen42/339/base 2025-12-04T08:57:44.0225827Z * [new branch] gh/williamwen42/339/head -> origin/gh/williamwen42/339/head 2025-12-04T08:57:44.0226954Z * [new branch] gh/williamwen42/339/orig -> origin/gh/williamwen42/339/orig 2025-12-04T08:57:44.0228850Z * [new branch] gh/williamwen42/340/base -> origin/gh/williamwen42/340/base 2025-12-04T08:57:44.0229653Z * [new branch] gh/williamwen42/340/head -> origin/gh/williamwen42/340/head 2025-12-04T08:57:44.0230733Z * [new branch] gh/williamwen42/340/orig -> origin/gh/williamwen42/340/orig 2025-12-04T08:57:44.0232545Z * [new branch] gh/williamwen42/341/base -> origin/gh/williamwen42/341/base 2025-12-04T08:57:44.0233789Z * [new branch] gh/williamwen42/341/head -> origin/gh/williamwen42/341/head 2025-12-04T08:57:44.0234849Z * [new branch] gh/williamwen42/341/orig -> origin/gh/williamwen42/341/orig 2025-12-04T08:57:44.0236361Z * [new branch] gh/williamwen42/342/base -> origin/gh/williamwen42/342/base 2025-12-04T08:57:44.0237410Z * [new branch] gh/williamwen42/342/head -> origin/gh/williamwen42/342/head 2025-12-04T08:57:44.0238535Z * [new branch] gh/williamwen42/342/orig -> origin/gh/williamwen42/342/orig 2025-12-04T08:57:44.0240117Z * [new branch] gh/williamwen42/343/base -> origin/gh/williamwen42/343/base 2025-12-04T08:57:44.0241204Z * [new branch] gh/williamwen42/343/head -> origin/gh/williamwen42/343/head 2025-12-04T08:57:44.0242289Z * [new branch] gh/williamwen42/343/orig -> origin/gh/williamwen42/343/orig 2025-12-04T08:57:44.0243888Z * [new branch] gh/williamwen42/344/base -> origin/gh/williamwen42/344/base 2025-12-04T08:57:44.0244933Z * [new branch] gh/williamwen42/344/head -> origin/gh/williamwen42/344/head 2025-12-04T08:57:44.0246011Z * [new branch] gh/williamwen42/344/orig -> origin/gh/williamwen42/344/orig 2025-12-04T08:57:44.0247699Z * [new branch] gh/williamwen42/345/base -> origin/gh/williamwen42/345/base 2025-12-04T08:57:44.0248900Z * [new branch] gh/williamwen42/345/head -> origin/gh/williamwen42/345/head 2025-12-04T08:57:44.0250113Z * [new branch] gh/williamwen42/345/orig -> origin/gh/williamwen42/345/orig 2025-12-04T08:57:44.0251688Z * [new branch] gh/williamwen42/346/base -> origin/gh/williamwen42/346/base 2025-12-04T08:57:44.0253270Z * [new branch] gh/williamwen42/346/head -> origin/gh/williamwen42/346/head 2025-12-04T08:57:44.0254347Z * [new branch] gh/williamwen42/346/orig -> origin/gh/williamwen42/346/orig 2025-12-04T08:57:44.0256068Z * [new branch] gh/williamwen42/347/base -> origin/gh/williamwen42/347/base 2025-12-04T08:57:44.0257398Z * [new branch] gh/williamwen42/347/head -> origin/gh/williamwen42/347/head 2025-12-04T08:57:44.0258548Z * [new branch] gh/williamwen42/347/orig -> origin/gh/williamwen42/347/orig 2025-12-04T08:57:44.0260102Z * [new branch] gh/williamwen42/348/base -> origin/gh/williamwen42/348/base 2025-12-04T08:57:44.0261103Z * [new branch] gh/williamwen42/348/head -> origin/gh/williamwen42/348/head 2025-12-04T08:57:44.0262193Z * [new branch] gh/williamwen42/348/orig -> origin/gh/williamwen42/348/orig 2025-12-04T08:57:44.0263627Z * [new branch] gh/williamwen42/349/base -> origin/gh/williamwen42/349/base 2025-12-04T08:57:44.0264771Z * [new branch] gh/williamwen42/349/head -> origin/gh/williamwen42/349/head 2025-12-04T08:57:44.0265897Z * [new branch] gh/williamwen42/349/orig -> origin/gh/williamwen42/349/orig 2025-12-04T08:57:44.0267958Z * [new branch] gh/williamwen42/350/base -> origin/gh/williamwen42/350/base 2025-12-04T08:57:44.0269134Z * [new branch] gh/williamwen42/350/head -> origin/gh/williamwen42/350/head 2025-12-04T08:57:44.0270252Z * [new branch] gh/williamwen42/350/orig -> origin/gh/williamwen42/350/orig 2025-12-04T08:57:44.0271970Z * [new branch] gh/williamwen42/351/base -> origin/gh/williamwen42/351/base 2025-12-04T08:57:44.0273042Z * [new branch] gh/williamwen42/351/head -> origin/gh/williamwen42/351/head 2025-12-04T08:57:44.0274195Z * [new branch] gh/williamwen42/351/orig -> origin/gh/williamwen42/351/orig 2025-12-04T08:57:44.0275757Z * [new branch] gh/williamwen42/352/base -> origin/gh/williamwen42/352/base 2025-12-04T08:57:44.0276805Z * [new branch] gh/williamwen42/352/head -> origin/gh/williamwen42/352/head 2025-12-04T08:57:44.0277886Z * [new branch] gh/williamwen42/352/orig -> origin/gh/williamwen42/352/orig 2025-12-04T08:57:44.0279559Z * [new branch] gh/williamwen42/353/base -> origin/gh/williamwen42/353/base 2025-12-04T08:57:44.0280765Z * [new branch] gh/williamwen42/353/head -> origin/gh/williamwen42/353/head 2025-12-04T08:57:44.0281874Z * [new branch] gh/williamwen42/353/orig -> origin/gh/williamwen42/353/orig 2025-12-04T08:57:44.0283355Z * [new branch] gh/williamwen42/354/base -> origin/gh/williamwen42/354/base 2025-12-04T08:57:44.0284501Z * [new branch] gh/williamwen42/354/head -> origin/gh/williamwen42/354/head 2025-12-04T08:57:44.0285608Z * [new branch] gh/williamwen42/354/orig -> origin/gh/williamwen42/354/orig 2025-12-04T08:57:44.0287211Z * [new branch] gh/williamwen42/355/base -> origin/gh/williamwen42/355/base 2025-12-04T08:57:44.0288242Z * [new branch] gh/williamwen42/355/head -> origin/gh/williamwen42/355/head 2025-12-04T08:57:44.0289376Z * [new branch] gh/williamwen42/355/orig -> origin/gh/williamwen42/355/orig 2025-12-04T08:57:44.0290949Z * [new branch] gh/williamwen42/356/base -> origin/gh/williamwen42/356/base 2025-12-04T08:57:44.0291962Z * [new branch] gh/williamwen42/356/head -> origin/gh/williamwen42/356/head 2025-12-04T08:57:44.0293047Z * [new branch] gh/williamwen42/356/orig -> origin/gh/williamwen42/356/orig 2025-12-04T08:57:44.0294602Z * [new branch] gh/williamwen42/357/base -> origin/gh/williamwen42/357/base 2025-12-04T08:57:44.0295744Z * [new branch] gh/williamwen42/357/head -> origin/gh/williamwen42/357/head 2025-12-04T08:57:44.0297168Z * [new branch] gh/williamwen42/357/orig -> origin/gh/williamwen42/357/orig 2025-12-04T08:57:44.0298730Z * [new branch] gh/williamwen42/358/base -> origin/gh/williamwen42/358/base 2025-12-04T08:57:44.0299812Z * [new branch] gh/williamwen42/358/head -> origin/gh/williamwen42/358/head 2025-12-04T08:57:44.0300996Z * [new branch] gh/williamwen42/358/orig -> origin/gh/williamwen42/358/orig 2025-12-04T08:57:44.0302881Z * [new branch] gh/xmfan/169/base -> origin/gh/xmfan/169/base 2025-12-04T08:57:44.0303979Z * [new branch] gh/xmfan/169/head -> origin/gh/xmfan/169/head 2025-12-04T08:57:44.0305522Z * [new branch] gh/xmfan/170/base -> origin/gh/xmfan/170/base 2025-12-04T08:57:44.0306424Z * [new branch] gh/xmfan/170/head -> origin/gh/xmfan/170/head 2025-12-04T08:57:44.0307947Z * [new branch] gh/xmfan/274/base -> origin/gh/xmfan/274/base 2025-12-04T08:57:44.0309130Z * [new branch] gh/xmfan/274/head -> origin/gh/xmfan/274/head 2025-12-04T08:57:44.0310314Z * [new branch] gh/xmfan/274/orig -> origin/gh/xmfan/274/orig 2025-12-04T08:57:44.0311794Z * [new branch] gh/xmfan/277/base -> origin/gh/xmfan/277/base 2025-12-04T08:57:44.0312826Z * [new branch] gh/xmfan/277/head -> origin/gh/xmfan/277/head 2025-12-04T08:57:44.0313906Z * [new branch] gh/xmfan/277/orig -> origin/gh/xmfan/277/orig 2025-12-04T08:57:44.0315304Z * [new branch] gh/xmfan/301/base -> origin/gh/xmfan/301/base 2025-12-04T08:57:44.0316568Z * [new branch] gh/xmfan/301/head -> origin/gh/xmfan/301/head 2025-12-04T08:57:44.0317573Z * [new branch] gh/xmfan/301/orig -> origin/gh/xmfan/301/orig 2025-12-04T08:57:44.0318994Z * [new branch] gh/xmfan/304/base -> origin/gh/xmfan/304/base 2025-12-04T08:57:44.0320066Z * [new branch] gh/xmfan/304/head -> origin/gh/xmfan/304/head 2025-12-04T08:57:44.0321491Z * [new branch] gh/xmfan/304/orig -> origin/gh/xmfan/304/orig 2025-12-04T08:57:44.0325653Z * [new branch] gh/xmfan/309/base -> origin/gh/xmfan/309/base 2025-12-04T08:57:44.0326781Z * [new branch] gh/xmfan/309/head -> origin/gh/xmfan/309/head 2025-12-04T08:57:44.0328078Z * [new branch] gh/xmfan/309/orig -> origin/gh/xmfan/309/orig 2025-12-04T08:57:44.0329574Z * [new branch] gh/xmfan/310/base -> origin/gh/xmfan/310/base 2025-12-04T08:57:44.0330718Z * [new branch] gh/xmfan/310/head -> origin/gh/xmfan/310/head 2025-12-04T08:57:44.0331820Z * [new branch] gh/xmfan/310/orig -> origin/gh/xmfan/310/orig 2025-12-04T08:57:44.0333272Z * [new branch] gh/xmfan/311/base -> origin/gh/xmfan/311/base 2025-12-04T08:57:44.0334467Z * [new branch] gh/xmfan/311/head -> origin/gh/xmfan/311/head 2025-12-04T08:57:44.0335584Z * [new branch] gh/xmfan/311/orig -> origin/gh/xmfan/311/orig 2025-12-04T08:57:44.0337814Z * [new branch] gh/xmfan/312/base -> origin/gh/xmfan/312/base 2025-12-04T08:57:44.0338947Z * [new branch] gh/xmfan/312/head -> origin/gh/xmfan/312/head 2025-12-04T08:57:44.0340046Z * [new branch] gh/xmfan/312/orig -> origin/gh/xmfan/312/orig 2025-12-04T08:57:44.0341504Z * [new branch] gh/xmfan/313/base -> origin/gh/xmfan/313/base 2025-12-04T08:57:44.0342594Z * [new branch] gh/xmfan/313/head -> origin/gh/xmfan/313/head 2025-12-04T08:57:44.0343793Z * [new branch] gh/xmfan/313/orig -> origin/gh/xmfan/313/orig 2025-12-04T08:57:44.0345610Z * [new branch] gh/xuanzhang816/27/base -> origin/gh/xuanzhang816/27/base 2025-12-04T08:57:44.0346712Z * [new branch] gh/xuanzhang816/27/head -> origin/gh/xuanzhang816/27/head 2025-12-04T08:57:44.0347847Z * [new branch] gh/xuanzhang816/27/orig -> origin/gh/xuanzhang816/27/orig 2025-12-04T08:57:44.0349558Z * [new branch] gh/xuanzhang816/32/base -> origin/gh/xuanzhang816/32/base 2025-12-04T08:57:44.0350638Z * [new branch] gh/xuanzhang816/32/head -> origin/gh/xuanzhang816/32/head 2025-12-04T08:57:44.0351715Z * [new branch] gh/xuanzhang816/32/orig -> origin/gh/xuanzhang816/32/orig 2025-12-04T08:57:44.0353184Z * [new branch] gh/xuanzhang816/33/base -> origin/gh/xuanzhang816/33/base 2025-12-04T08:57:44.0354283Z * [new branch] gh/xuanzhang816/33/head -> origin/gh/xuanzhang816/33/head 2025-12-04T08:57:44.0355361Z * [new branch] gh/xuanzhang816/33/orig -> origin/gh/xuanzhang816/33/orig 2025-12-04T08:57:44.0357186Z * [new branch] gh/xuanzhang816/34/base -> origin/gh/xuanzhang816/34/base 2025-12-04T08:57:44.0358439Z * [new branch] gh/xuanzhang816/34/head -> origin/gh/xuanzhang816/34/head 2025-12-04T08:57:44.0359531Z * [new branch] gh/xuanzhang816/34/orig -> origin/gh/xuanzhang816/34/orig 2025-12-04T08:57:44.0361229Z * [new branch] gh/xuanzhang816/35/base -> origin/gh/xuanzhang816/35/base 2025-12-04T08:57:44.0362291Z * [new branch] gh/xuanzhang816/35/head -> origin/gh/xuanzhang816/35/head 2025-12-04T08:57:44.0363383Z * [new branch] gh/xuanzhang816/35/orig -> origin/gh/xuanzhang816/35/orig 2025-12-04T08:57:44.0365294Z * [new branch] gh/yanbing-j/11/base -> origin/gh/yanbing-j/11/base 2025-12-04T08:57:44.0366339Z * [new branch] gh/yanbing-j/11/head -> origin/gh/yanbing-j/11/head 2025-12-04T08:57:44.0367403Z * [new branch] gh/yanbing-j/11/orig -> origin/gh/yanbing-j/11/orig 2025-12-04T08:57:44.0368870Z * [new branch] gh/yanbing-j/12/base -> origin/gh/yanbing-j/12/base 2025-12-04T08:57:44.0369972Z * [new branch] gh/yanbing-j/12/head -> origin/gh/yanbing-j/12/head 2025-12-04T08:57:44.0371023Z * [new branch] gh/yanbing-j/12/orig -> origin/gh/yanbing-j/12/orig 2025-12-04T08:57:44.0372618Z * [new branch] gh/yanbing-j/13/base -> origin/gh/yanbing-j/13/base 2025-12-04T08:57:44.0373718Z * [new branch] gh/yanbing-j/13/head -> origin/gh/yanbing-j/13/head 2025-12-04T08:57:44.0374790Z * [new branch] gh/yanbing-j/13/orig -> origin/gh/yanbing-j/13/orig 2025-12-04T08:57:44.0376205Z * [new branch] gh/yanbing-j/14/base -> origin/gh/yanbing-j/14/base 2025-12-04T08:57:44.0377707Z * [new branch] gh/yanbing-j/14/head -> origin/gh/yanbing-j/14/head 2025-12-04T08:57:44.0378892Z * [new branch] gh/yanbing-j/14/orig -> origin/gh/yanbing-j/14/orig 2025-12-04T08:57:44.0380240Z * [new branch] gh/yanbing-j/15/base -> origin/gh/yanbing-j/15/base 2025-12-04T08:57:44.0381373Z * [new branch] gh/yanbing-j/15/head -> origin/gh/yanbing-j/15/head 2025-12-04T08:57:44.0382499Z * [new branch] gh/yanbing-j/15/orig -> origin/gh/yanbing-j/15/orig 2025-12-04T08:57:44.0383948Z * [new branch] gh/yanbing-j/18/base -> origin/gh/yanbing-j/18/base 2025-12-04T08:57:44.0385176Z * [new branch] gh/yanbing-j/18/head -> origin/gh/yanbing-j/18/head 2025-12-04T08:57:44.0386304Z * [new branch] gh/yanbing-j/18/orig -> origin/gh/yanbing-j/18/orig 2025-12-04T08:57:44.0387863Z * [new branch] gh/yanbing-j/19/base -> origin/gh/yanbing-j/19/base 2025-12-04T08:57:44.0388959Z * [new branch] gh/yanbing-j/19/head -> origin/gh/yanbing-j/19/head 2025-12-04T08:57:44.0390190Z * [new branch] gh/yanbing-j/19/orig -> origin/gh/yanbing-j/19/orig 2025-12-04T08:57:44.0392081Z * [new branch] gh/yanbing-j/20/base -> origin/gh/yanbing-j/20/base 2025-12-04T08:57:44.0393188Z * [new branch] gh/yanbing-j/20/head -> origin/gh/yanbing-j/20/head 2025-12-04T08:57:44.0394313Z * [new branch] gh/yanbing-j/20/orig -> origin/gh/yanbing-j/20/orig 2025-12-04T08:57:44.0395788Z * [new branch] gh/yanbing-j/21/base -> origin/gh/yanbing-j/21/base 2025-12-04T08:57:44.0396903Z * [new branch] gh/yanbing-j/21/head -> origin/gh/yanbing-j/21/head 2025-12-04T08:57:44.0398378Z * [new branch] gh/yanbing-j/22/base -> origin/gh/yanbing-j/22/base 2025-12-04T08:57:44.0399450Z * [new branch] gh/yanbing-j/22/head -> origin/gh/yanbing-j/22/head 2025-12-04T08:57:44.0400490Z * [new branch] gh/yanbing-j/22/orig -> origin/gh/yanbing-j/22/orig 2025-12-04T08:57:44.0402054Z * [new branch] gh/yanbing-j/23/base -> origin/gh/yanbing-j/23/base 2025-12-04T08:57:44.0403206Z * [new branch] gh/yanbing-j/23/head -> origin/gh/yanbing-j/23/head 2025-12-04T08:57:44.0404300Z * [new branch] gh/yanbing-j/23/orig -> origin/gh/yanbing-j/23/orig 2025-12-04T08:57:44.0405744Z * [new branch] gh/yanbing-j/24/base -> origin/gh/yanbing-j/24/base 2025-12-04T08:57:44.0406828Z * [new branch] gh/yanbing-j/24/head -> origin/gh/yanbing-j/24/head 2025-12-04T08:57:44.0407930Z * [new branch] gh/yanbing-j/24/orig -> origin/gh/yanbing-j/24/orig 2025-12-04T08:57:44.0409430Z * [new branch] gh/yanbing-j/25/base -> origin/gh/yanbing-j/25/base 2025-12-04T08:57:44.0410476Z * [new branch] gh/yanbing-j/25/head -> origin/gh/yanbing-j/25/head 2025-12-04T08:57:44.0411535Z * [new branch] gh/yanbing-j/25/orig -> origin/gh/yanbing-j/25/orig 2025-12-04T08:57:44.0412976Z * [new branch] gh/yanbing-j/26/base -> origin/gh/yanbing-j/26/base 2025-12-04T08:57:44.0414049Z * [new branch] gh/yanbing-j/26/head -> origin/gh/yanbing-j/26/head 2025-12-04T08:57:44.0415129Z * [new branch] gh/yanbing-j/26/orig -> origin/gh/yanbing-j/26/orig 2025-12-04T08:57:44.0417557Z * [new branch] gh/yang-yu-hang/1/base -> origin/gh/yang-yu-hang/1/base 2025-12-04T08:57:44.0418746Z * [new branch] gh/yang-yu-hang/1/head -> origin/gh/yang-yu-hang/1/head 2025-12-04T08:57:44.0420062Z * [new branch] gh/yang-yu-hang/1/orig -> origin/gh/yang-yu-hang/1/orig 2025-12-04T08:57:44.0421861Z * [new branch] gh/yang-yu-hang/2/base -> origin/gh/yang-yu-hang/2/base 2025-12-04T08:57:44.0423266Z * [new branch] gh/yang-yu-hang/2/head -> origin/gh/yang-yu-hang/2/head 2025-12-04T08:57:44.0424669Z * [new branch] gh/yang-yu-hang/2/orig -> origin/gh/yang-yu-hang/2/orig 2025-12-04T08:57:44.0426166Z * [new branch] gh/yang-yu-hang/3/base -> origin/gh/yang-yu-hang/3/base 2025-12-04T08:57:44.0427337Z * [new branch] gh/yang-yu-hang/3/head -> origin/gh/yang-yu-hang/3/head 2025-12-04T08:57:44.0428487Z * [new branch] gh/yang-yu-hang/3/orig -> origin/gh/yang-yu-hang/3/orig 2025-12-04T08:57:44.0430296Z * [new branch] gh/yangw-dev/12/base -> origin/gh/yangw-dev/12/base 2025-12-04T08:57:44.0431799Z * [new branch] gh/yangw-dev/12/head -> origin/gh/yangw-dev/12/head 2025-12-04T08:57:44.0433005Z * [new branch] gh/yangw-dev/12/orig -> origin/gh/yangw-dev/12/orig 2025-12-04T08:57:44.0434973Z * [new branch] gh/yangw-dev/13/base -> origin/gh/yangw-dev/13/base 2025-12-04T08:57:44.0436161Z * [new branch] gh/yangw-dev/13/head -> origin/gh/yangw-dev/13/head 2025-12-04T08:57:44.0437232Z * [new branch] gh/yangw-dev/13/orig -> origin/gh/yangw-dev/13/orig 2025-12-04T08:57:44.0438655Z * [new branch] gh/yangw-dev/14/base -> origin/gh/yangw-dev/14/base 2025-12-04T08:57:44.0439799Z * [new branch] gh/yangw-dev/14/head -> origin/gh/yangw-dev/14/head 2025-12-04T08:57:44.0440899Z * [new branch] gh/yangw-dev/14/orig -> origin/gh/yangw-dev/14/orig 2025-12-04T08:57:44.0442344Z * [new branch] gh/yangw-dev/15/base -> origin/gh/yangw-dev/15/base 2025-12-04T08:57:44.0443458Z * [new branch] gh/yangw-dev/15/head -> origin/gh/yangw-dev/15/head 2025-12-04T08:57:44.0444525Z * [new branch] gh/yangw-dev/15/orig -> origin/gh/yangw-dev/15/orig 2025-12-04T08:57:44.0445966Z * [new branch] gh/yangw-dev/19/base -> origin/gh/yangw-dev/19/base 2025-12-04T08:57:44.0447011Z * [new branch] gh/yangw-dev/19/head -> origin/gh/yangw-dev/19/head 2025-12-04T08:57:44.0448088Z * [new branch] gh/yangw-dev/19/orig -> origin/gh/yangw-dev/19/orig 2025-12-04T08:57:44.0449573Z * [new branch] gh/yangw-dev/26/base -> origin/gh/yangw-dev/26/base 2025-12-04T08:57:44.0450829Z * [new branch] gh/yangw-dev/26/head -> origin/gh/yangw-dev/26/head 2025-12-04T08:57:44.0451877Z * [new branch] gh/yangw-dev/26/orig -> origin/gh/yangw-dev/26/orig 2025-12-04T08:57:44.0453289Z * [new branch] gh/yangw-dev/27/base -> origin/gh/yangw-dev/27/base 2025-12-04T08:57:44.0454388Z * [new branch] gh/yangw-dev/27/head -> origin/gh/yangw-dev/27/head 2025-12-04T08:57:44.0455585Z * [new branch] gh/yangw-dev/27/orig -> origin/gh/yangw-dev/27/orig 2025-12-04T08:57:44.0457680Z * [new branch] gh/ydwu4/292/base -> origin/gh/ydwu4/292/base 2025-12-04T08:57:44.0458731Z * [new branch] gh/ydwu4/292/head -> origin/gh/ydwu4/292/head 2025-12-04T08:57:44.0459803Z * [new branch] gh/ydwu4/292/orig -> origin/gh/ydwu4/292/orig 2025-12-04T08:57:44.0461344Z * [new branch] gh/ydwu4/294/base -> origin/gh/ydwu4/294/base 2025-12-04T08:57:44.0462400Z * [new branch] gh/ydwu4/294/head -> origin/gh/ydwu4/294/head 2025-12-04T08:57:44.0463616Z * [new branch] gh/ydwu4/294/orig -> origin/gh/ydwu4/294/orig 2025-12-04T08:57:44.0465289Z * [new branch] gh/ydwu4/295/base -> origin/gh/ydwu4/295/base 2025-12-04T08:57:44.0466455Z * [new branch] gh/ydwu4/295/head -> origin/gh/ydwu4/295/head 2025-12-04T08:57:44.0467551Z * [new branch] gh/ydwu4/295/orig -> origin/gh/ydwu4/295/orig 2025-12-04T08:57:44.0469032Z * [new branch] gh/ydwu4/296/base -> origin/gh/ydwu4/296/base 2025-12-04T08:57:44.0470107Z * [new branch] gh/ydwu4/296/head -> origin/gh/ydwu4/296/head 2025-12-04T08:57:44.0471220Z * [new branch] gh/ydwu4/296/orig -> origin/gh/ydwu4/296/orig 2025-12-04T08:57:44.0472746Z * [new branch] gh/ydwu4/306/base -> origin/gh/ydwu4/306/base 2025-12-04T08:57:44.0473875Z * [new branch] gh/ydwu4/306/head -> origin/gh/ydwu4/306/head 2025-12-04T08:57:44.0475509Z * [new branch] gh/ydwu4/306/orig -> origin/gh/ydwu4/306/orig 2025-12-04T08:57:44.0477026Z * [new branch] gh/ydwu4/312/base -> origin/gh/ydwu4/312/base 2025-12-04T08:57:44.0478100Z * [new branch] gh/ydwu4/312/head -> origin/gh/ydwu4/312/head 2025-12-04T08:57:44.0479299Z * [new branch] gh/ydwu4/312/orig -> origin/gh/ydwu4/312/orig 2025-12-04T08:57:44.0480679Z * [new branch] gh/ydwu4/322/base -> origin/gh/ydwu4/322/base 2025-12-04T08:57:44.0481733Z * [new branch] gh/ydwu4/322/head -> origin/gh/ydwu4/322/head 2025-12-04T08:57:44.0482794Z * [new branch] gh/ydwu4/322/orig -> origin/gh/ydwu4/322/orig 2025-12-04T08:57:44.0484240Z * [new branch] gh/ydwu4/327/base -> origin/gh/ydwu4/327/base 2025-12-04T08:57:44.0485368Z * [new branch] gh/ydwu4/327/head -> origin/gh/ydwu4/327/head 2025-12-04T08:57:44.0486454Z * [new branch] gh/ydwu4/327/orig -> origin/gh/ydwu4/327/orig 2025-12-04T08:57:44.0488031Z * [new branch] gh/ydwu4/328/base -> origin/gh/ydwu4/328/base 2025-12-04T08:57:44.0489044Z * [new branch] gh/ydwu4/328/head -> origin/gh/ydwu4/328/head 2025-12-04T08:57:44.0490104Z * [new branch] gh/ydwu4/328/orig -> origin/gh/ydwu4/328/orig 2025-12-04T08:57:44.0491397Z * [new branch] gh/ydwu4/329/base -> origin/gh/ydwu4/329/base 2025-12-04T08:57:44.0492464Z * [new branch] gh/ydwu4/329/head -> origin/gh/ydwu4/329/head 2025-12-04T08:57:44.0493625Z * [new branch] gh/ydwu4/329/orig -> origin/gh/ydwu4/329/orig 2025-12-04T08:57:44.0495152Z * [new branch] gh/ydwu4/330/base -> origin/gh/ydwu4/330/base 2025-12-04T08:57:44.0496210Z * [new branch] gh/ydwu4/330/head -> origin/gh/ydwu4/330/head 2025-12-04T08:57:44.0497650Z * [new branch] gh/ydwu4/330/orig -> origin/gh/ydwu4/330/orig 2025-12-04T08:57:44.0499033Z * [new branch] gh/ydwu4/331/base -> origin/gh/ydwu4/331/base 2025-12-04T08:57:44.0500158Z * [new branch] gh/ydwu4/331/head -> origin/gh/ydwu4/331/head 2025-12-04T08:57:44.0501346Z * [new branch] gh/ydwu4/331/orig -> origin/gh/ydwu4/331/orig 2025-12-04T08:57:44.0502623Z * [new branch] gh/ydwu4/332/base -> origin/gh/ydwu4/332/base 2025-12-04T08:57:44.0503705Z * [new branch] gh/ydwu4/332/head -> origin/gh/ydwu4/332/head 2025-12-04T08:57:44.0504840Z * [new branch] gh/ydwu4/332/orig -> origin/gh/ydwu4/332/orig 2025-12-04T08:57:44.0506124Z * [new branch] gh/ydwu4/333/base -> origin/gh/ydwu4/333/base 2025-12-04T08:57:44.0507217Z * [new branch] gh/ydwu4/333/head -> origin/gh/ydwu4/333/head 2025-12-04T08:57:44.0508395Z * [new branch] gh/ydwu4/333/orig -> origin/gh/ydwu4/333/orig 2025-12-04T08:57:44.0509825Z * [new branch] gh/ydwu4/334/base -> origin/gh/ydwu4/334/base 2025-12-04T08:57:44.0510932Z * [new branch] gh/ydwu4/334/head -> origin/gh/ydwu4/334/head 2025-12-04T08:57:44.0512059Z * [new branch] gh/ydwu4/334/orig -> origin/gh/ydwu4/334/orig 2025-12-04T08:57:44.0513367Z * [new branch] gh/ydwu4/335/base -> origin/gh/ydwu4/335/base 2025-12-04T08:57:44.0514594Z * [new branch] gh/ydwu4/335/head -> origin/gh/ydwu4/335/head 2025-12-04T08:57:44.0515673Z * [new branch] gh/ydwu4/335/orig -> origin/gh/ydwu4/335/orig 2025-12-04T08:57:44.0517513Z * [new branch] gh/ydwu4/337/base -> origin/gh/ydwu4/337/base 2025-12-04T08:57:44.0518602Z * [new branch] gh/ydwu4/337/head -> origin/gh/ydwu4/337/head 2025-12-04T08:57:44.0519708Z * [new branch] gh/ydwu4/337/orig -> origin/gh/ydwu4/337/orig 2025-12-04T08:57:44.0521479Z * [new branch] gh/ydwu4/339/base -> origin/gh/ydwu4/339/base 2025-12-04T08:57:44.0522736Z * [new branch] gh/ydwu4/339/head -> origin/gh/ydwu4/339/head 2025-12-04T08:57:44.0523941Z * [new branch] gh/ydwu4/339/orig -> origin/gh/ydwu4/339/orig 2025-12-04T08:57:44.0525881Z * [new branch] gh/yf225/133/base -> origin/gh/yf225/133/base 2025-12-04T08:57:44.0526983Z * [new branch] gh/yf225/133/head -> origin/gh/yf225/133/head 2025-12-04T08:57:44.0528464Z * [new branch] gh/yf225/93/base -> origin/gh/yf225/93/base 2025-12-04T08:57:44.0529569Z * [new branch] gh/yf225/93/head -> origin/gh/yf225/93/head 2025-12-04T08:57:44.0531943Z * [new branch] gh/yifuwang/152/base -> origin/gh/yifuwang/152/base 2025-12-04T08:57:44.0533573Z * [new branch] gh/yifuwang/152/head -> origin/gh/yifuwang/152/head 2025-12-04T08:57:44.0534729Z * [new branch] gh/yifuwang/152/orig -> origin/gh/yifuwang/152/orig 2025-12-04T08:57:44.0536176Z * [new branch] gh/yifuwang/195/base -> origin/gh/yifuwang/195/base 2025-12-04T08:57:44.0537702Z * [new branch] gh/yifuwang/195/head -> origin/gh/yifuwang/195/head 2025-12-04T08:57:44.0538822Z * [new branch] gh/yifuwang/195/orig -> origin/gh/yifuwang/195/orig 2025-12-04T08:57:44.0540673Z * [new branch] gh/yiming0416/1/base -> origin/gh/yiming0416/1/base 2025-12-04T08:57:44.0541797Z * [new branch] gh/yiming0416/1/head -> origin/gh/yiming0416/1/head 2025-12-04T08:57:44.0543161Z * [new branch] gh/yiming0416/2/base -> origin/gh/yiming0416/2/base 2025-12-04T08:57:44.0544188Z * [new branch] gh/yiming0416/2/head -> origin/gh/yiming0416/2/head 2025-12-04T08:57:44.0545986Z * [new branch] gh/yushangdi/1/base -> origin/gh/yushangdi/1/base 2025-12-04T08:57:44.0547104Z * [new branch] gh/yushangdi/1/head -> origin/gh/yushangdi/1/head 2025-12-04T08:57:44.0548765Z * [new branch] gh/yushangdi/10/base -> origin/gh/yushangdi/10/base 2025-12-04T08:57:44.0549870Z * [new branch] gh/yushangdi/10/head -> origin/gh/yushangdi/10/head 2025-12-04T08:57:44.0550997Z * [new branch] gh/yushangdi/10/orig -> origin/gh/yushangdi/10/orig 2025-12-04T08:57:44.0552513Z * [new branch] gh/yushangdi/11/base -> origin/gh/yushangdi/11/base 2025-12-04T08:57:44.0553585Z * [new branch] gh/yushangdi/11/head -> origin/gh/yushangdi/11/head 2025-12-04T08:57:44.0554691Z * [new branch] gh/yushangdi/11/orig -> origin/gh/yushangdi/11/orig 2025-12-04T08:57:44.0556021Z * [new branch] gh/yushangdi/2/base -> origin/gh/yushangdi/2/base 2025-12-04T08:57:44.0557067Z * [new branch] gh/yushangdi/2/head -> origin/gh/yushangdi/2/head 2025-12-04T08:57:44.0558536Z * [new branch] gh/yushangdi/7/base -> origin/gh/yushangdi/7/base 2025-12-04T08:57:44.0559607Z * [new branch] gh/yushangdi/7/head -> origin/gh/yushangdi/7/head 2025-12-04T08:57:44.0560683Z * [new branch] gh/yushangdi/7/orig -> origin/gh/yushangdi/7/orig 2025-12-04T08:57:44.0562436Z * [new branch] gh/yushangdi/8/base -> origin/gh/yushangdi/8/base 2025-12-04T08:57:44.0563719Z * [new branch] gh/yushangdi/8/head -> origin/gh/yushangdi/8/head 2025-12-04T08:57:44.0564797Z * [new branch] gh/yushangdi/8/orig -> origin/gh/yushangdi/8/orig 2025-12-04T08:57:44.0566233Z * [new branch] gh/yushangdi/9/base -> origin/gh/yushangdi/9/base 2025-12-04T08:57:44.0567382Z * [new branch] gh/yushangdi/9/head -> origin/gh/yushangdi/9/head 2025-12-04T08:57:44.0568467Z * [new branch] gh/yushangdi/9/orig -> origin/gh/yushangdi/9/orig 2025-12-04T08:57:44.0570247Z * [new branch] gh/zklaus/19/base -> origin/gh/zklaus/19/base 2025-12-04T08:57:44.0571351Z * [new branch] gh/zklaus/19/head -> origin/gh/zklaus/19/head 2025-12-04T08:57:44.0572452Z * [new branch] gh/zklaus/19/orig -> origin/gh/zklaus/19/orig 2025-12-04T08:57:44.0574352Z * [new branch] gh/zklaus/20/base -> origin/gh/zklaus/20/base 2025-12-04T08:57:44.0575495Z * [new branch] gh/zklaus/20/head -> origin/gh/zklaus/20/head 2025-12-04T08:57:44.0576833Z * [new branch] gh/zklaus/20/orig -> origin/gh/zklaus/20/orig 2025-12-04T08:57:44.0578446Z * [new branch] gh/zklaus/21/base -> origin/gh/zklaus/21/base 2025-12-04T08:57:44.0579559Z * [new branch] gh/zklaus/21/head -> origin/gh/zklaus/21/head 2025-12-04T08:57:44.0580661Z * [new branch] gh/zklaus/21/orig -> origin/gh/zklaus/21/orig 2025-12-04T08:57:44.0582211Z * [new branch] gh/zklaus/22/base -> origin/gh/zklaus/22/base 2025-12-04T08:57:44.0583471Z * [new branch] gh/zklaus/22/head -> origin/gh/zklaus/22/head 2025-12-04T08:57:44.0584550Z * [new branch] gh/zklaus/22/orig -> origin/gh/zklaus/22/orig 2025-12-04T08:57:44.0586120Z * [new branch] gh/zklaus/23/base -> origin/gh/zklaus/23/base 2025-12-04T08:57:44.0587214Z * [new branch] gh/zklaus/23/head -> origin/gh/zklaus/23/head 2025-12-04T08:57:44.0588363Z * [new branch] gh/zklaus/23/orig -> origin/gh/zklaus/23/orig 2025-12-04T08:57:44.0589798Z * [new branch] gh/zklaus/24/base -> origin/gh/zklaus/24/base 2025-12-04T08:57:44.0590880Z * [new branch] gh/zklaus/24/head -> origin/gh/zklaus/24/head 2025-12-04T08:57:44.0591958Z * [new branch] gh/zklaus/24/orig -> origin/gh/zklaus/24/orig 2025-12-04T08:57:44.0593864Z * [new branch] gh/zou3519/1197/base -> origin/gh/zou3519/1197/base 2025-12-04T08:57:44.0594979Z * [new branch] gh/zou3519/1197/head -> origin/gh/zou3519/1197/head 2025-12-04T08:57:44.0596125Z * [new branch] gh/zou3519/1197/orig -> origin/gh/zou3519/1197/orig 2025-12-04T08:57:44.0597883Z * [new branch] gh/zou3519/1199/base -> origin/gh/zou3519/1199/base 2025-12-04T08:57:44.0599478Z * [new branch] gh/zou3519/1199/head -> origin/gh/zou3519/1199/head 2025-12-04T08:57:44.0600570Z * [new branch] gh/zou3519/1199/orig -> origin/gh/zou3519/1199/orig 2025-12-04T08:57:44.0602026Z * [new branch] gh/zou3519/1200/base -> origin/gh/zou3519/1200/base 2025-12-04T08:57:44.0603134Z * [new branch] gh/zou3519/1200/head -> origin/gh/zou3519/1200/head 2025-12-04T08:57:44.0604252Z * [new branch] gh/zou3519/1200/orig -> origin/gh/zou3519/1200/orig 2025-12-04T08:57:44.0605746Z * [new branch] gh/zou3519/1201/base -> origin/gh/zou3519/1201/base 2025-12-04T08:57:44.0606817Z * [new branch] gh/zou3519/1201/head -> origin/gh/zou3519/1201/head 2025-12-04T08:57:44.0607878Z * [new branch] gh/zou3519/1201/orig -> origin/gh/zou3519/1201/orig 2025-12-04T08:57:44.0609169Z * [new branch] gh/zou3519/1202/base -> origin/gh/zou3519/1202/base 2025-12-04T08:57:44.0610240Z * [new branch] gh/zou3519/1202/head -> origin/gh/zou3519/1202/head 2025-12-04T08:57:44.0611425Z * [new branch] gh/zou3519/1202/orig -> origin/gh/zou3519/1202/orig 2025-12-04T08:57:44.0613217Z * [new branch] gh/zpcore/1/base -> origin/gh/zpcore/1/base 2025-12-04T08:57:44.0614267Z * [new branch] gh/zpcore/1/head -> origin/gh/zpcore/1/head 2025-12-04T08:57:44.0615789Z * [new branch] gh/zpcore/11/base -> origin/gh/zpcore/11/base 2025-12-04T08:57:44.0617214Z * [new branch] gh/zpcore/11/head -> origin/gh/zpcore/11/head 2025-12-04T08:57:44.0618354Z * [new branch] gh/zpcore/11/orig -> origin/gh/zpcore/11/orig 2025-12-04T08:57:44.0620323Z * [new branch] gh/zpcore/12/base -> origin/gh/zpcore/12/base 2025-12-04T08:57:44.0622190Z * [new branch] gh/zpcore/12/head -> origin/gh/zpcore/12/head 2025-12-04T08:57:44.0623450Z * [new branch] gh/zpcore/12/orig -> origin/gh/zpcore/12/orig 2025-12-04T08:57:44.0625045Z * [new branch] gh/zpcore/13/base -> origin/gh/zpcore/13/base 2025-12-04T08:57:44.0626108Z * [new branch] gh/zpcore/13/head -> origin/gh/zpcore/13/head 2025-12-04T08:57:44.0627307Z * [new branch] gh/zpcore/13/orig -> origin/gh/zpcore/13/orig 2025-12-04T08:57:44.0628827Z * [new branch] gh/zpcore/14/base -> origin/gh/zpcore/14/base 2025-12-04T08:57:44.0629971Z * [new branch] gh/zpcore/14/head -> origin/gh/zpcore/14/head 2025-12-04T08:57:44.0631110Z * [new branch] gh/zpcore/14/orig -> origin/gh/zpcore/14/orig 2025-12-04T08:57:44.0632945Z * [new branch] gh/zpcore/15/base -> origin/gh/zpcore/15/base 2025-12-04T08:57:44.0634008Z * [new branch] gh/zpcore/15/head -> origin/gh/zpcore/15/head 2025-12-04T08:57:44.0635135Z * [new branch] gh/zpcore/15/orig -> origin/gh/zpcore/15/orig 2025-12-04T08:57:44.0636587Z * [new branch] gh/zpcore/2/base -> origin/gh/zpcore/2/base 2025-12-04T08:57:44.0637682Z * [new branch] gh/zpcore/2/head -> origin/gh/zpcore/2/head 2025-12-04T08:57:44.0639716Z * [new branch] gh/zpcore/21/base -> origin/gh/zpcore/21/base 2025-12-04T08:57:44.0640904Z * [new branch] gh/zpcore/21/head -> origin/gh/zpcore/21/head 2025-12-04T08:57:44.0642072Z * [new branch] gh/zpcore/21/orig -> origin/gh/zpcore/21/orig 2025-12-04T08:57:44.0644010Z * [new branch] gh/zpcore/22/base -> origin/gh/zpcore/22/base 2025-12-04T08:57:44.0644928Z * [new branch] gh/zpcore/22/head -> origin/gh/zpcore/22/head 2025-12-04T08:57:44.0646136Z * [new branch] gh/zpcore/22/orig -> origin/gh/zpcore/22/orig 2025-12-04T08:57:44.0647651Z * [new branch] gh/zpcore/23/base -> origin/gh/zpcore/23/base 2025-12-04T08:57:44.0648723Z * [new branch] gh/zpcore/23/head -> origin/gh/zpcore/23/head 2025-12-04T08:57:44.0649833Z * [new branch] gh/zpcore/23/orig -> origin/gh/zpcore/23/orig 2025-12-04T08:57:44.0651274Z * [new branch] gh/zpcore/24/base -> origin/gh/zpcore/24/base 2025-12-04T08:57:44.0652353Z * [new branch] gh/zpcore/24/head -> origin/gh/zpcore/24/head 2025-12-04T08:57:44.0653424Z * [new branch] gh/zpcore/24/orig -> origin/gh/zpcore/24/orig 2025-12-04T08:57:44.0655132Z * [new branch] gh/zpcore/25/base -> origin/gh/zpcore/25/base 2025-12-04T08:57:44.0656208Z * [new branch] gh/zpcore/25/head -> origin/gh/zpcore/25/head 2025-12-04T08:57:44.0657684Z * [new branch] gh/zpcore/25/orig -> origin/gh/zpcore/25/orig 2025-12-04T08:57:44.0659322Z * [new branch] gh/zpcore/26/base -> origin/gh/zpcore/26/base 2025-12-04T08:57:44.0660508Z * [new branch] gh/zpcore/26/head -> origin/gh/zpcore/26/head 2025-12-04T08:57:44.0661657Z * [new branch] gh/zpcore/26/orig -> origin/gh/zpcore/26/orig 2025-12-04T08:57:44.0663250Z * [new branch] gh/zpcore/27/base -> origin/gh/zpcore/27/base 2025-12-04T08:57:44.0664361Z * [new branch] gh/zpcore/27/head -> origin/gh/zpcore/27/head 2025-12-04T08:57:44.0665466Z * [new branch] gh/zpcore/27/orig -> origin/gh/zpcore/27/orig 2025-12-04T08:57:44.0667554Z * [new branch] gh/zpcore/28/base -> origin/gh/zpcore/28/base 2025-12-04T08:57:44.0669302Z * [new branch] gh/zpcore/28/head -> origin/gh/zpcore/28/head 2025-12-04T08:57:44.0670412Z * [new branch] gh/zpcore/28/orig -> origin/gh/zpcore/28/orig 2025-12-04T08:57:44.0671708Z * [new branch] gh/zpcore/3/base -> origin/gh/zpcore/3/base 2025-12-04T08:57:44.0672756Z * [new branch] gh/zpcore/3/head -> origin/gh/zpcore/3/head 2025-12-04T08:57:44.0674139Z * [new branch] gh/zpcore/4/base -> origin/gh/zpcore/4/base 2025-12-04T08:57:44.0675171Z * [new branch] gh/zpcore/4/head -> origin/gh/zpcore/4/head 2025-12-04T08:57:44.0676457Z * [new branch] gh/zpcore/5/base -> origin/gh/zpcore/5/base 2025-12-04T08:57:44.0677495Z * [new branch] gh/zpcore/5/head -> origin/gh/zpcore/5/head 2025-12-04T08:57:44.0678734Z * [new branch] gh/zpcore/6/base -> origin/gh/zpcore/6/base 2025-12-04T08:57:44.0679763Z * [new branch] gh/zpcore/6/head -> origin/gh/zpcore/6/head 2025-12-04T08:57:44.0681472Z * [new branch] gh/zpcore/7/base -> origin/gh/zpcore/7/base 2025-12-04T08:57:44.0682525Z * [new branch] gh/zpcore/7/head -> origin/gh/zpcore/7/head 2025-12-04T08:57:44.0683851Z * [new branch] gh/zpcore/8/base -> origin/gh/zpcore/8/base 2025-12-04T08:57:44.0684937Z * [new branch] gh/zpcore/8/head -> origin/gh/zpcore/8/head 2025-12-04T08:57:44.0686203Z * [new branch] google-main -> origin/google-main 2025-12-04T08:57:44.0687847Z * [new branch] guangyey/external_stream -> origin/guangyey/external_stream 2025-12-04T08:57:44.0688809Z * [new branch] guangyey/test_2025 -> origin/guangyey/test_2025 2025-12-04T08:57:44.0690539Z * [new branch] guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9 2025-12-04T08:57:44.0691922Z * [new branch] hameerabbasi/complex_tensor_subclass -> origin/hameerabbasi/complex_tensor_subclass 2025-12-04T08:57:44.0693161Z * [new branch] hameerabbasi/fix-ctensor-gradcheck-tests -> origin/hameerabbasi/fix-ctensor-gradcheck-tests 2025-12-04T08:57:44.0694040Z * [new branch] hameerabbasi/gradcheck-allclose -> origin/hameerabbasi/gradcheck-allclose 2025-12-04T08:57:44.0695094Z * [new branch] hc_baseline -> origin/hc_baseline 2025-12-04T08:57:44.0696908Z * [new branch] hhh_rand -> origin/hhh_rand 2025-12-04T08:57:44.0698526Z * [new branch] huba/f1 -> origin/huba/f1 2025-12-04T08:57:44.0700366Z * [new branch] increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test -> origin/increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test 2025-12-04T08:57:44.0701056Z * [new branch] inlining -> origin/inlining 2025-12-04T08:57:44.0702298Z * [new branch] inlining-ezyang -> origin/inlining-ezyang 2025-12-04T08:57:44.0703517Z * [new branch] install-torchao-0.13.0 -> origin/install-torchao-0.13.0 2025-12-04T08:57:44.0705121Z * [new branch] instrument-trunk-pull-linux-with-job-test-filters -> origin/instrument-trunk-pull-linux-with-job-test-filters 2025-12-04T08:57:44.0705888Z * [new branch] invoke-subgraph -> origin/invoke-subgraph 2025-12-04T08:57:44.0707113Z * [new branch] issue#58739 -> origin/issue#58739 2025-12-04T08:57:44.0708406Z * [new branch] jainapurva-patch-1 -> origin/jainapurva-patch-1 2025-12-04T08:57:44.0709897Z * [new branch] jathu/o3 -> origin/jathu/o3 2025-12-04T08:57:44.0710916Z * [new branch] jathu/sve -> origin/jathu/sve 2025-12-04T08:57:44.0712637Z * [new branch] jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2 2025-12-04T08:57:44.0713678Z * [new branch] jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2 2025-12-04T08:57:44.0715101Z * [new branch] jiannanWang/memorysnapshot_filter -> origin/jiannanWang/memorysnapshot_filter 2025-12-04T08:57:44.0716193Z * [new branch] jiannanWang/profilerstepwarning -> origin/jiannanWang/profilerstepwarning 2025-12-04T08:57:44.0717607Z * [new branch] jithunnair-amd-patch-1 -> origin/jithunnair-amd-patch-1 2025-12-04T08:57:44.0718779Z * [new branch] jithunnair-amd-patch-10 -> origin/jithunnair-amd-patch-10 2025-12-04T08:57:44.0719934Z * [new branch] jithunnair-amd-patch-2 -> origin/jithunnair-amd-patch-2 2025-12-04T08:57:44.0721629Z * [new branch] jithunnair-amd-patch-3 -> origin/jithunnair-amd-patch-3 2025-12-04T08:57:44.0722926Z * [new branch] jithunnair-amd-patch-4 -> origin/jithunnair-amd-patch-4 2025-12-04T08:57:44.0724029Z * [new branch] jithunnair-amd-patch-5 -> origin/jithunnair-amd-patch-5 2025-12-04T08:57:44.0725287Z * [new branch] jithunnair-amd-patch-6 -> origin/jithunnair-amd-patch-6 2025-12-04T08:57:44.0726415Z * [new branch] jithunnair-amd-patch-7 -> origin/jithunnair-amd-patch-7 2025-12-04T08:57:44.0727722Z * [new branch] jithunnair-amd-patch-8 -> origin/jithunnair-amd-patch-8 2025-12-04T08:57:44.0728919Z * [new branch] jithunnair-amd-patch-9 -> origin/jithunnair-amd-patch-9 2025-12-04T08:57:44.0730511Z * [new branch] justinchu/native-qdq -> origin/justinchu/native-qdq 2025-12-04T08:57:44.0731949Z * [new branch] kainan666/xlf_debug -> origin/kainan666/xlf_debug 2025-12-04T08:57:44.0733199Z * [new branch] kainan_test -> origin/kainan_test 2025-12-04T08:57:44.0734372Z * [new branch] larryliu0820-patch-1 -> origin/larryliu0820-patch-1 2025-12-04T08:57:44.0736441Z * [new branch] leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues 2025-12-04T08:57:44.0738274Z * [new branch] lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error 2025-12-04T08:57:44.0739642Z * [new branch] liaoxuan/shm_all_reduce -> origin/liaoxuan/shm_all_reduce 2025-12-04T08:57:44.0740776Z * [new branch] liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax 2025-12-04T08:57:44.0741800Z * [new branch] liaoxuan/test_int8_sdpa -> origin/liaoxuan/test_int8_sdpa 2025-12-04T08:57:44.0742865Z * [new branch] llama4-stable -> origin/llama4-stable 2025-12-04T08:57:44.0744754Z * [new branch] lts/release/1.8 -> origin/lts/release/1.8 2025-12-04T08:57:44.0746257Z * [new branch] lucaskabela/#94773 -> origin/lucaskabela/#94773 2025-12-04T08:57:44.0747337Z * [new branch] lucaskabela/fix_164876 -> origin/lucaskabela/fix_164876 2025-12-04T08:57:44.0748454Z * [new branch] lucaskabela/flop_counter -> origin/lucaskabela/flop_counter 2025-12-04T08:57:44.0749633Z * [new branch] lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp 2025-12-04T08:57:44.0750698Z * [new branch] lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo 2025-12-04T08:57:44.0751820Z * [new branch] lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr 2025-12-04T08:57:44.0753229Z * [new branch] lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr 2025-12-04T08:57:44.0754743Z * [new branch] lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata 2025-12-04T08:57:44.0755714Z * [new branch] lucaskabela/rnn_decomp -> origin/lucaskabela/rnn_decomp 2025-12-04T08:57:44.0756801Z * [new branch] lucaskabela/typing_backends -> origin/lucaskabela/typing_backends 2025-12-04T08:57:44.0757942Z * [new branch] lucaskabela/typing_ctx_manager -> origin/lucaskabela/typing_ctx_manager 2025-12-04T08:57:44.0759003Z * [new branch] lucaskabela/typing_nn_module -> origin/lucaskabela/typing_nn_module 2025-12-04T08:57:44.0760112Z * [new branch] lucaskabela/typing_user_defined -> origin/lucaskabela/typing_user_defined 2025-12-04T08:57:44.0761176Z * [new branch] lucaskabela/typing_variables -> origin/lucaskabela/typing_variables 2025-12-04T08:57:44.0762290Z * [new branch] lucaskabela/typing_variables_dicts -> origin/lucaskabela/typing_variables_dicts 2025-12-04T08:57:44.0763415Z * [new branch] lucaskabela/typing_variables_functions -> origin/lucaskabela/typing_variables_functions 2025-12-04T08:57:44.0764372Z * [new branch] lucaskabela/typing_variables_lists -> origin/lucaskabela/typing_variables_lists 2025-12-04T08:57:44.0765930Z * [new branch] lw/torch_box_by_ref -> origin/lw/torch_box_by_ref 2025-12-04T08:57:44.0767144Z * [new branch] main -> origin/main 2025-12-04T08:57:44.0768431Z * [new branch] malfet-patch-1 -> origin/malfet-patch-1 2025-12-04T08:57:44.0769680Z * [new branch] malfet-patch-2 -> origin/malfet-patch-2 2025-12-04T08:57:44.0770956Z * [new branch] malfet-patch-3 -> origin/malfet-patch-3 2025-12-04T08:57:44.0772227Z * [new branch] malfet-patch-4 -> origin/malfet-patch-4 2025-12-04T08:57:44.0773429Z * [new branch] malfet-patch-5 -> origin/malfet-patch-5 2025-12-04T08:57:44.0774645Z * [new branch] malfet-patch-6 -> origin/malfet-patch-6 2025-12-04T08:57:44.0775774Z * [new branch] malfet-patch-7 -> origin/malfet-patch-7 2025-12-04T08:57:44.0777367Z * [new branch] malfet-patch-8 -> origin/malfet-patch-8 2025-12-04T08:57:44.0778981Z * [new branch] malfet/add-3.14-ci -> origin/malfet/add-3.14-ci 2025-12-04T08:57:44.0780357Z * [new branch] malfet/be-do-not-make-typos-in-build-artifacts -> origin/malfet/be-do-not-make-typos-in-build-artifacts 2025-12-04T08:57:44.0781465Z * [new branch] malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch 2025-12-04T08:57:44.0782851Z * [new branch] malfet/be-remove-misisng-neon-headers -> origin/malfet/be-remove-misisng-neon-headers 2025-12-04T08:57:44.0784111Z * [new branch] malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im 2025-12-04T08:57:44.0785749Z * [new branch] manuel/aoti_metal_shimify-thread_safe -> origin/manuel/aoti_metal_shimify-thread_safe 2025-12-04T08:57:44.0786682Z * [new branch] manuel/inductor_link_openmp -> origin/manuel/inductor_link_openmp 2025-12-04T08:57:44.0788298Z * [new branch] masnesral/metaconda -> origin/masnesral/metaconda 2025-12-04T08:57:44.0789590Z * [new branch] mem_profiler_flaky_fix -> origin/mem_profiler_flaky_fix 2025-12-04T08:57:44.0790740Z * [new branch] mem_profiler_stack_trace -> origin/mem_profiler_stack_trace 2025-12-04T08:57:44.0791897Z * [new branch] memory_profiler_stack -> origin/memory_profiler_stack 2025-12-04T08:57:44.0793764Z * [new branch] metascroy-patch-1 -> origin/metascroy-patch-1 2025-12-04T08:57:44.0794913Z * [new branch] mingw_posix -> origin/mingw_posix 2025-12-04T08:57:44.0796442Z * [new branch] mlazos/S429861-debug -> origin/mlazos/S429861-debug 2025-12-04T08:57:44.0797450Z * [new branch] mlazos/aa -> origin/mlazos/aa 2025-12-04T08:57:44.0798508Z * [new branch] mlazos/acts -> origin/mlazos/acts 2025-12-04T08:57:44.0799613Z * [new branch] mlazos/arg-renames -> origin/mlazos/arg-renames 2025-12-04T08:57:44.0800647Z * [new branch] mlazos/bad-cudagraphs -> origin/mlazos/bad-cudagraphs 2025-12-04T08:57:44.0801732Z * [new branch] mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks 2025-12-04T08:57:44.0802777Z * [new branch] mlazos/beta-tensor -> origin/mlazos/beta-tensor 2025-12-04T08:57:44.0803771Z * [new branch] mlazos/buffers -> origin/mlazos/buffers 2025-12-04T08:57:44.0804692Z * [new branch] mlazos/buffers2 -> origin/mlazos/buffers2 2025-12-04T08:57:44.0806145Z * [new branch] mlazos/buffers3 -> origin/mlazos/buffers3 2025-12-04T08:57:44.0807420Z * [new branch] mlazos/bwd -> origin/mlazos/bwd 2025-12-04T08:57:44.0808865Z * [new branch] mlazos/combo-test -> origin/mlazos/combo-test 2025-12-04T08:57:44.0810002Z * [new branch] mlazos/ctx-cleanup -> origin/mlazos/ctx-cleanup 2025-12-04T08:57:44.0811096Z * [new branch] mlazos/cuda-cmd-log -> origin/mlazos/cuda-cmd-log 2025-12-04T08:57:44.0812300Z * [new branch] mlazos/cudagraph-tests -> origin/mlazos/cudagraph-tests 2025-12-04T08:57:44.0813467Z * [new branch] mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement 2025-12-04T08:57:44.0814550Z * [new branch] mlazos/cutlass-test -> origin/mlazos/cutlass-test 2025-12-04T08:57:44.0815657Z * [new branch] mlazos/cutlass-topo-bug -> origin/mlazos/cutlass-topo-bug 2025-12-04T08:57:44.0817151Z * [new branch] mlazos/dataclass-proxy -> origin/mlazos/dataclass-proxy 2025-12-04T08:57:44.0818125Z * [new branch] mlazos/dc-attrs -> origin/mlazos/dc-attrs 2025-12-04T08:57:44.0819355Z * [new branch] mlazos/dc-helion -> origin/mlazos/dc-helion 2025-12-04T08:57:44.0820499Z * [new branch] mlazos/dict-fix -> origin/mlazos/dict-fix 2025-12-04T08:57:44.0824264Z * [new branch] mlazos/disable-tf -> origin/mlazos/disable-tf 2025-12-04T08:57:44.0825405Z * [new branch] mlazos/dupe-fix -> origin/mlazos/dupe-fix 2025-12-04T08:57:44.0826611Z * [new branch] mlazos/dyn-batch -> origin/mlazos/dyn-batch 2025-12-04T08:57:44.0827789Z * [new branch] mlazos/evt -> origin/mlazos/evt 2025-12-04T08:57:44.0829025Z * [new branch] mlazos/extract-examples -> origin/mlazos/extract-examples 2025-12-04T08:57:44.0830103Z * [new branch] mlazos/foreach-op -> origin/mlazos/foreach-op 2025-12-04T08:57:44.0831152Z * [new branch] mlazos/fp8 -> origin/mlazos/fp8 2025-12-04T08:57:44.0832304Z * [new branch] mlazos/fp8-bias -> origin/mlazos/fp8-bias 2025-12-04T08:57:44.0833539Z * [new branch] mlazos/fp8-bias-fusion -> origin/mlazos/fp8-bias-fusion 2025-12-04T08:57:44.0834589Z * [new branch] mlazos/fp8-fixes -> origin/mlazos/fp8-fixes 2025-12-04T08:57:44.0835678Z * [new branch] mlazos/freezing -> origin/mlazos/freezing 2025-12-04T08:57:44.0836731Z * [new branch] mlazos/h-comp -> origin/mlazos/h-comp 2025-12-04T08:57:44.0837873Z * [new branch] mlazos/h-comp2 -> origin/mlazos/h-comp2 2025-12-04T08:57:44.0839113Z * [new branch] mlazos/hash-hop -> origin/mlazos/hash-hop 2025-12-04T08:57:44.0840247Z * [new branch] mlazos/hc -> origin/mlazos/hc 2025-12-04T08:57:44.0841361Z * [new branch] mlazos/hc-cycles -> origin/mlazos/hc-cycles 2025-12-04T08:57:44.0842452Z * [new branch] mlazos/hc-fixes -> origin/mlazos/hc-fixes 2025-12-04T08:57:44.0843510Z * [new branch] mlazos/hc-fixes3 -> origin/mlazos/hc-fixes3 2025-12-04T08:57:44.0844800Z * [new branch] mlazos/hc-fixes4 -> origin/mlazos/hc-fixes4 2025-12-04T08:57:44.0845986Z * [new branch] mlazos/hc-hf -> origin/mlazos/hc-hf 2025-12-04T08:57:44.0847062Z * [new branch] mlazos/hc-mut -> origin/mlazos/hc-mut 2025-12-04T08:57:44.0848142Z * [new branch] mlazos/hc10 -> origin/mlazos/hc10 2025-12-04T08:57:44.0849228Z * [new branch] mlazos/hc11 -> origin/mlazos/hc11 2025-12-04T08:57:44.0850299Z * [new branch] mlazos/hc12 -> origin/mlazos/hc12 2025-12-04T08:57:44.0851353Z * [new branch] mlazos/hc13 -> origin/mlazos/hc13 2025-12-04T08:57:44.0852420Z * [new branch] mlazos/hc14 -> origin/mlazos/hc14 2025-12-04T08:57:44.0853515Z * [new branch] mlazos/hc15 -> origin/mlazos/hc15 2025-12-04T08:57:44.0854597Z * [new branch] mlazos/hc2 -> origin/mlazos/hc2 2025-12-04T08:57:44.0855674Z * [new branch] mlazos/hc4 -> origin/mlazos/hc4 2025-12-04T08:57:44.0857047Z * [new branch] mlazos/hc5 -> origin/mlazos/hc5 2025-12-04T08:57:44.0858216Z * [new branch] mlazos/hc6 -> origin/mlazos/hc6 2025-12-04T08:57:44.0859335Z * [new branch] mlazos/hc7 -> origin/mlazos/hc7 2025-12-04T08:57:44.0860336Z * [new branch] mlazos/hc8 -> origin/mlazos/hc8 2025-12-04T08:57:44.0861569Z * [new branch] mlazos/hc9 -> origin/mlazos/hc9 2025-12-04T08:57:44.0862553Z * [new branch] mlazos/hc_baseline2 -> origin/mlazos/hc_baseline2 2025-12-04T08:57:44.0863761Z * [new branch] mlazos/inductor-streams -> origin/mlazos/inductor-streams 2025-12-04T08:57:44.0865096Z * [new branch] mlazos/main -> origin/mlazos/main 2025-12-04T08:57:44.0866253Z * [new branch] mlazos/mcg2 -> origin/mlazos/mcg2 2025-12-04T08:57:44.0867438Z * [new branch] mlazos/meta-guards -> origin/mlazos/meta-guards 2025-12-04T08:57:44.0869154Z * [new branch] mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam 2025-12-04T08:57:44.0870252Z * [new branch] mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup 2025-12-04T08:57:44.0871276Z * [new branch] mlazos/mod-fix -> origin/mlazos/mod-fix 2025-12-04T08:57:44.0872422Z * [new branch] mlazos/mode-fix -> origin/mlazos/mode-fix 2025-12-04T08:57:44.0873518Z * [new branch] mlazos/offsets -> origin/mlazos/offsets 2025-12-04T08:57:44.0874516Z * [new branch] mlazos/overguarding -> origin/mlazos/overguarding 2025-12-04T08:57:44.0875645Z * [new branch] mlazos/proxy-ctors -> origin/mlazos/proxy-ctors 2025-12-04T08:57:44.0876723Z * [new branch] mlazos/quant-fix -> origin/mlazos/quant-fix 2025-12-04T08:57:44.0877791Z * [new branch] mlazos/resnet-fix -> origin/mlazos/resnet-fix 2025-12-04T08:57:44.0878928Z * [new branch] mlazos/rm-buf-names -> origin/mlazos/rm-buf-names 2025-12-04T08:57:44.0880020Z * [new branch] mlazos/rm-code -> origin/mlazos/rm-code 2025-12-04T08:57:44.0881230Z * [new branch] mlazos/rm-spam -> origin/mlazos/rm-spam 2025-12-04T08:57:44.0882285Z * [new branch] mlazos/rtp -> origin/mlazos/rtp 2025-12-04T08:57:44.0883448Z * [new branch] mlazos/static-idx-dbg -> origin/mlazos/static-idx-dbg 2025-12-04T08:57:44.0884572Z * [new branch] mlazos/static-inputs-log -> origin/mlazos/static-inputs-log 2025-12-04T08:57:44.0885508Z * [new branch] mlazos/stests -> origin/mlazos/stests 2025-12-04T08:57:44.0886978Z * [new branch] mlazos/stream-ops -> origin/mlazos/stream-ops 2025-12-04T08:57:44.0888040Z * [new branch] mlazos/td-fix2 -> origin/mlazos/td-fix2 2025-12-04T08:57:44.0889186Z * [new branch] mlazos/tensor-hasattr2 -> origin/mlazos/tensor-hasattr2 2025-12-04T08:57:44.0890224Z * [new branch] mlazos/test -> origin/mlazos/test 2025-12-04T08:57:44.0891387Z * [new branch] mlazos/tf-mode -> origin/mlazos/tf-mode 2025-12-04T08:57:44.0892550Z * [new branch] mlazos/tf-mode-backup2 -> origin/mlazos/tf-mode-backup2 2025-12-04T08:57:44.0893625Z * [new branch] mlazos/tf-mode-reland -> origin/mlazos/tf-mode-reland 2025-12-04T08:57:44.0894826Z * [new branch] mlazos/tf-mode-reland2 -> origin/mlazos/tf-mode-reland2 2025-12-04T08:57:44.0895865Z * [new branch] mlazos/tf-mode-reland3 -> origin/mlazos/tf-mode-reland3 2025-12-04T08:57:44.0897286Z * [new branch] mlazos/triton-no-epi -> origin/mlazos/triton-no-epi 2025-12-04T08:57:44.0898406Z * [new branch] mlazos/tune-proto -> origin/mlazos/tune-proto 2025-12-04T08:57:44.0899660Z * [new branch] mlazos/tuple-fixes -> origin/mlazos/tuple-fixes 2025-12-04T08:57:44.0900784Z * [new branch] mlazos/tuple-fixes2 -> origin/mlazos/tuple-fixes2 2025-12-04T08:57:44.0902156Z * [new branch] mlazos/tuple-handling -> origin/mlazos/tuple-handling 2025-12-04T08:57:44.0903182Z * [new branch] mlazos/user-stream-base -> origin/mlazos/user-stream-base 2025-12-04T08:57:44.0904261Z * [new branch] mlazos/user-streams -> origin/mlazos/user-streams 2025-12-04T08:57:44.0905431Z * [new branch] mlazos/user-streams-backup -> origin/mlazos/user-streams-backup 2025-12-04T08:57:44.0906602Z * [new branch] mlazos/user-streams-backup2 -> origin/mlazos/user-streams-backup2 2025-12-04T08:57:44.0907661Z * [new branch] mlazos/vary-beta -> origin/mlazos/vary-beta 2025-12-04T08:57:44.0908894Z * [new branch] mlazos/vary-beta2 -> origin/mlazos/vary-beta2 2025-12-04T08:57:44.0909989Z * [new branch] mlazos/weird-perf1 -> origin/mlazos/weird-perf1 2025-12-04T08:57:44.0911172Z * [new branch] mm_out_dtype_compile -> origin/mm_out_dtype_compile 2025-12-04T08:57:44.0912291Z * [new branch] module-shim -> origin/module-shim 2025-12-04T08:57:44.0913429Z * [new branch] move_config -> origin/move_config 2025-12-04T08:57:44.0914857Z * [new branch] msaroufim/reduce -> origin/msaroufim/reduce 2025-12-04T08:57:44.0916351Z * [new branch] mtia/basic-cmake -> origin/mtia/basic-cmake 2025-12-04T08:57:44.0917869Z * [new branch] mwizak/fix-triton-block-shape -> origin/mwizak/fix-triton-block-shape 2025-12-04T08:57:44.0918931Z * [new branch] my_varlen_backup -> origin/my_varlen_backup 2025-12-04T08:57:44.0920089Z * [new branch] nativert_num_outputs -> origin/nativert_num_outputs 2025-12-04T08:57:44.0921563Z * [new branch] new-codegen -> origin/new-codegen 2025-12-04T08:57:44.0922905Z * [new branch] newtest-base -> origin/newtest-base 2025-12-04T08:57:44.0924498Z * [new branch] ngimel/addmm_dtype -> origin/ngimel/addmm_dtype 2025-12-04T08:57:44.0925542Z * [new branch] ngimel/div_inv -> origin/ngimel/div_inv 2025-12-04T08:57:44.0926622Z * [new branch] ngimel/error_index_list -> origin/ngimel/error_index_list 2025-12-04T08:57:44.0927644Z * [new branch] ngimel/gather_grid -> origin/ngimel/gather_grid 2025-12-04T08:57:44.0928825Z * [new branch] ngimel/gather_grid_release -> origin/ngimel/gather_grid_release 2025-12-04T08:57:44.0929783Z * [new branch] ngimel/gg_new -> origin/ngimel/gg_new 2025-12-04T08:57:44.0930860Z * [new branch] ngimel/hostalloc -> origin/ngimel/hostalloc 2025-12-04T08:57:44.0931911Z * [new branch] ngimel/storage_id -> origin/ngimel/storage_id 2025-12-04T08:57:44.0933286Z * [new branch] nightly -> origin/nightly 2025-12-04T08:57:44.0934963Z * [new branch] nikitaved/addmm_1_rowcol_lt_path_check -> origin/nikitaved/addmm_1_rowcol_lt_path_check 2025-12-04T08:57:44.0936592Z * [new branch] nikitaved/addmm_epilogue_fusions_2d_bias -> origin/nikitaved/addmm_epilogue_fusions_2d_bias 2025-12-04T08:57:44.0938026Z * [new branch] nikitaved/addmm_epilogue_fusions_inductor -> origin/nikitaved/addmm_epilogue_fusions_inductor 2025-12-04T08:57:44.0939293Z * [new branch] nikitaved/addmm_epilogue_fusions_scratch -> origin/nikitaved/addmm_epilogue_fusions_scratch 2025-12-04T08:57:44.0940602Z * [new branch] nikitaved/grad_addmm_epilogue_fusions -> origin/nikitaved/grad_addmm_epilogue_fusions 2025-12-04T08:57:44.0942001Z * [new branch] nikitaved/simpler_can_use_32bit_index -> origin/nikitaved/simpler_can_use_32bit_index 2025-12-04T08:57:44.0943025Z * [new branch] nikitaved/test -> origin/nikitaved/test 2025-12-04T08:57:44.0944759Z * [new branch] nmacchioni-perf-test-async-autotune -> origin/nmacchioni-perf-test-async-autotune 2025-12-04T08:57:44.0945667Z * [new branch] no_distributed_log_spew -> origin/no_distributed_log_spew 2025-12-04T08:57:44.0946888Z * [new branch] nofun-hack -> origin/nofun-hack 2025-12-04T08:57:44.0948138Z * [new branch] norm_bench -> origin/norm_bench 2025-12-04T08:57:44.0949721Z * [new branch] nullplay/fuse_matmul -> origin/nullplay/fuse_matmul 2025-12-04T08:57:44.0950840Z * [new branch] nullplay_fuse_matmul -> origin/nullplay_fuse_matmul 2025-12-04T08:57:44.0952076Z * [new branch] optimizer_test -> origin/optimizer_test 2025-12-04T08:57:44.0953859Z * [new branch] orig/release/1.10 -> origin/orig/release/1.10 2025-12-04T08:57:44.0955014Z * [new branch] orig/release/1.11 -> origin/orig/release/1.11 2025-12-04T08:57:44.0956152Z * [new branch] orig/release/1.12 -> origin/orig/release/1.12 2025-12-04T08:57:44.0957443Z * [new branch] orig/release/1.13 -> origin/orig/release/1.13 2025-12-04T08:57:44.0958616Z * [new branch] orig/release/1.6 -> origin/orig/release/1.6 2025-12-04T08:57:44.0959902Z * [new branch] orig/release/1.7 -> origin/orig/release/1.7 2025-12-04T08:57:44.0961041Z * [new branch] orig/release/1.8 -> origin/orig/release/1.8 2025-12-04T08:57:44.0962183Z * [new branch] orig/release/1.9 -> origin/orig/release/1.9 2025-12-04T08:57:44.0963292Z * [new branch] orig/release/2.0 -> origin/orig/release/2.0 2025-12-04T08:57:44.0964414Z * [new branch] orig/release/2.1 -> origin/orig/release/2.1 2025-12-04T08:57:44.0965519Z * [new branch] orig/release/2.2 -> origin/orig/release/2.2 2025-12-04T08:57:44.0966719Z * [new branch] orig/release/2.3 -> origin/orig/release/2.3 2025-12-04T08:57:44.0967803Z * [new branch] orig/release/2.4 -> origin/orig/release/2.4 2025-12-04T08:57:44.0968960Z * [new branch] orig/release/2.5 -> origin/orig/release/2.5 2025-12-04T08:57:44.0970013Z * [new branch] orig/release/2.6 -> origin/orig/release/2.6 2025-12-04T08:57:44.0971404Z * [new branch] orig/release/2.7 -> origin/orig/release/2.7 2025-12-04T08:57:44.0972915Z * [new branch] orig/release/2.8 -> origin/orig/release/2.8 2025-12-04T08:57:44.0974347Z * [new branch] orig/release/2.9 -> origin/orig/release/2.9 2025-12-04T08:57:44.0976978Z * [new branch] origin/gh/fxdawnn/1/base -> origin/origin/gh/fxdawnn/1/base 2025-12-04T08:57:44.0978095Z * [new branch] origin/gh/fxdawnn/1/orig -> origin/origin/gh/fxdawnn/1/orig 2025-12-04T08:57:44.0980374Z * [new branch] origin/gh/zpcore/14/orig -> origin/origin/gh/zpcore/14/orig 2025-12-04T08:57:44.0981750Z * [new branch] oulgen-patch-1 -> origin/oulgen-patch-1 2025-12-04T08:57:44.0983044Z * [new branch] oulgen-patch-2 -> origin/oulgen-patch-2 2025-12-04T08:57:44.0984373Z * [new branch] oulgen-patch-3 -> origin/oulgen-patch-3 2025-12-04T08:57:44.0985652Z * [new branch] oulgen-patch-4 -> origin/oulgen-patch-4 2025-12-04T08:57:44.0986826Z * [new branch] padded-tensor -> origin/padded-tensor 2025-12-04T08:57:44.0988033Z * [new branch] pca2 -> origin/pca2 2025-12-04T08:57:44.0989438Z * [new branch] per_channel_backup -> origin/per_channel_backup 2025-12-04T08:57:44.0990592Z * [new branch] perf_ops -> origin/perf_ops 2025-12-04T08:57:44.0991778Z * [new branch] perf_ops_2_9 -> origin/perf_ops_2_9 2025-12-04T08:57:44.0993125Z * [new branch] pianpwk-patch-1 -> origin/pianpwk-patch-1 2025-12-04T08:57:44.0994518Z * [new branch] pianpwk/__draft_debug_mode -> origin/pianpwk/__draft_debug_mode 2025-12-04T08:57:44.0995655Z * [new branch] pianpwk/_debug_mode_for_triton_draft -> origin/pianpwk/_debug_mode_for_triton_draft 2025-12-04T08:57:44.0996560Z * [new branch] pianpwk/_debug_nn_module_compile -> origin/pianpwk/_debug_nn_module_compile 2025-12-04T08:57:44.0997606Z * [new branch] pianpwk/_draft_triton_11_3 -> origin/pianpwk/_draft_triton_11_3 2025-12-04T08:57:44.0998754Z * [new branch] pianpwk/_manual_bucket_draft -> origin/pianpwk/_manual_bucket_draft 2025-12-04T08:57:44.1000054Z * [new branch] pianpwk/_profile_w_dispatch_keys -> origin/pianpwk/_profile_w_dispatch_keys 2025-12-04T08:57:44.1001821Z * [new branch] pianpwk/_super_draft_debug_mode -> origin/pianpwk/_super_draft_debug_mode 2025-12-04T08:57:44.1003147Z * [new branch] pianpwk/_unbacked_local_shard_size -> origin/pianpwk/_unbacked_local_shard_size 2025-12-04T08:57:44.1004187Z * [new branch] pianpwk/anomaly_tb -> origin/pianpwk/anomaly_tb 2025-12-04T08:57:44.1005261Z * [new branch] pianpwk/auto_fx_annotate -> origin/pianpwk/auto_fx_annotate 2025-12-04T08:57:44.1006442Z * [new branch] pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export 2025-12-04T08:57:44.1007464Z * [new branch] pianpwk/bert_dynamic_perf -> origin/pianpwk/bert_dynamic_perf 2025-12-04T08:57:44.1008715Z * [new branch] pianpwk/debug_fwd_stack_traces -> origin/pianpwk/debug_fwd_stack_traces 2025-12-04T08:57:44.1009826Z * [new branch] pianpwk/debug_hash_tensor -> origin/pianpwk/debug_hash_tensor 2025-12-04T08:57:44.1010968Z * [new branch] pianpwk/debug_mode_annotate -> origin/pianpwk/debug_mode_annotate 2025-12-04T08:57:44.1011995Z * [new branch] pianpwk/debug_mode_defaults -> origin/pianpwk/debug_mode_defaults 2025-12-04T08:57:44.1013112Z * [new branch] pianpwk/debug_mode_hacks -> origin/pianpwk/debug_mode_hacks 2025-12-04T08:57:44.1014244Z * [new branch] pianpwk/debug_mode_opcall_refactor -> origin/pianpwk/debug_mode_opcall_refactor 2025-12-04T08:57:44.1015272Z * [new branch] pianpwk/debug_mode_show_ids -> origin/pianpwk/debug_mode_show_ids 2025-12-04T08:57:44.1016519Z * [new branch] pianpwk/debug_mode_triton -> origin/pianpwk/debug_mode_triton 2025-12-04T08:57:44.1018059Z * [new branch] pianpwk/debug_show_stack_trace -> origin/pianpwk/debug_show_stack_trace 2025-12-04T08:57:44.1019177Z * [new branch] pianpwk/debug_wait_on_collective -> origin/pianpwk/debug_wait_on_collective 2025-12-04T08:57:44.1020287Z * [new branch] pianpwk/debugmode_compile_tf -> origin/pianpwk/debugmode_compile_tf 2025-12-04T08:57:44.1021931Z * [new branch] pianpwk/dispatch_key_debugging_for_debug -> origin/pianpwk/dispatch_key_debugging_for_debug 2025-12-04T08:57:44.1023028Z * [new branch] pianpwk/draft_debug_mode_tfcompile -> origin/pianpwk/draft_debug_mode_tfcompile 2025-12-04T08:57:44.1024129Z * [new branch] pianpwk/draft_multikernel_nn -> origin/pianpwk/draft_multikernel_nn 2025-12-04T08:57:44.1025301Z * [new branch] pianpwk/draft_multikernel_status_10_5 -> origin/pianpwk/draft_multikernel_status_10_5 2025-12-04T08:57:44.1026435Z * [new branch] pianpwk/dtensor_custom_chunk -> origin/pianpwk/dtensor_custom_chunk 2025-12-04T08:57:44.1027717Z * [new branch] pianpwk/dtensor_unbacked_keypath -> origin/pianpwk/dtensor_unbacked_keypath 2025-12-04T08:57:44.1028916Z * [new branch] pianpwk/event_list_tree -> origin/pianpwk/event_list_tree 2025-12-04T08:57:44.1030215Z * [new branch] pianpwk/false_numel_refs -> origin/pianpwk/false_numel_refs 2025-12-04T08:57:44.1031353Z * [new branch] pianpwk/maybe_guard_rel -> origin/pianpwk/maybe_guard_rel 2025-12-04T08:57:44.1032689Z * [new branch] pianpwk/multikernel_hints_draft -> origin/pianpwk/multikernel_hints_draft 2025-12-04T08:57:44.1033859Z * [new branch] pianpwk/no_size_oblivious_slice_scat -> origin/pianpwk/no_size_oblivious_slice_scat 2025-12-04T08:57:44.1034975Z * [new branch] pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better 2025-12-04T08:57:44.1035978Z * [new branch] pianpwk/pre_forward_hook -> origin/pianpwk/pre_forward_hook 2025-12-04T08:57:44.1037135Z * [new branch] pianpwk/skip_python_keys_alternate -> origin/pianpwk/skip_python_keys_alternate 2025-12-04T08:57:44.1038199Z * [new branch] pianpwk/skip_python_keys_in_guards -> origin/pianpwk/skip_python_keys_in_guards 2025-12-04T08:57:44.1039214Z * [new branch] pianpwk/sym_tokens_draft -> origin/pianpwk/sym_tokens_draft 2025-12-04T08:57:44.1040435Z * [new branch] pianpwk/symint_one_hot -> origin/pianpwk/symint_one_hot 2025-12-04T08:57:44.1041669Z * [new branch] pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false 2025-12-04T08:57:44.1043100Z * [new branch] pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap 2025-12-04T08:57:44.1044128Z * [new branch] pianpwk/try_dumb_stuff -> origin/pianpwk/try_dumb_stuff 2025-12-04T08:57:44.1045218Z * [new branch] pianpwk/try_dumb_stuff_2 -> origin/pianpwk/try_dumb_stuff_2 2025-12-04T08:57:44.1046360Z * [new branch] pianpwk/unbacked_dtensor_mm -> origin/pianpwk/unbacked_dtensor_mm 2025-12-04T08:57:44.1047442Z * [new branch] pianpwk/unbacked_tracing_12_2 -> origin/pianpwk/unbacked_tracing_12_2 2025-12-04T08:57:44.1048433Z * [new branch] pianpwk/user_symints -> origin/pianpwk/user_symints 2025-12-04T08:57:44.1049565Z * [new branch] pianpwk/wan21_reshape -> origin/pianpwk/wan21_reshape 2025-12-04T08:57:44.1051055Z * [new branch] piz/fix_partial_backward_1112 -> origin/piz/fix_partial_backward_1112 2025-12-04T08:57:44.1051998Z * [new branch] piz/prop_cache_clean -> origin/piz/prop_cache_clean 2025-12-04T08:57:44.1053238Z * [new branch] pool-separate -> origin/pool-separate 2025-12-04T08:57:44.1054369Z * [new branch] pr-156087 -> origin/pr-156087 2025-12-04T08:57:44.1055928Z * [new branch] pr/131860 -> origin/pr/131860 2025-12-04T08:57:44.1057977Z * [new branch] predispatch_to -> origin/predispatch_to 2025-12-04T08:57:44.1059135Z * [new branch] protect-c17 -> origin/protect-c17 2025-12-04T08:57:44.1060367Z * [new branch] pt-opt-cuda3 -> origin/pt-opt-cuda3 2025-12-04T08:57:44.1062077Z * [new branch] python_compiled_autograd -> origin/python_compiled_autograd 2025-12-04T08:57:44.1063788Z * [new branch] q1l1/fix_device_moved_constant_type_unknown -> origin/q1l1/fix_device_moved_constant_type_unknown 2025-12-04T08:57:44.1064972Z * [new branch] q1l1/fix_wrong_default_type_for_kernel_call_args -> origin/q1l1/fix_wrong_default_type_for_kernel_call_args 2025-12-04T08:57:44.1066842Z * [new branch] qchip/export-D54134695 -> origin/qchip/export-D54134695 2025-12-04T08:57:44.1068118Z * [new branch] quote-pytest_cache -> origin/quote-pytest_cache 2025-12-04T08:57:44.1069699Z * [new branch] reland-accgrad-stream-warn -> origin/reland-accgrad-stream-warn 2025-12-04T08:57:44.1071220Z * [new branch] release/1.10 -> origin/release/1.10 2025-12-04T08:57:44.1072463Z * [new branch] release/1.11 -> origin/release/1.11 2025-12-04T08:57:44.1073557Z * [new branch] release/1.12 -> origin/release/1.12 2025-12-04T08:57:44.1074647Z * [new branch] release/1.13 -> origin/release/1.13 2025-12-04T08:57:44.1075814Z * [new branch] release/1.4 -> origin/release/1.4 2025-12-04T08:57:44.1076729Z * [new branch] release/1.4.1 -> origin/release/1.4.1 2025-12-04T08:57:44.1077825Z * [new branch] release/1.5 -> origin/release/1.5 2025-12-04T08:57:44.1079007Z * [new branch] release/1.6 -> origin/release/1.6 2025-12-04T08:57:44.1080142Z * [new branch] release/1.7 -> origin/release/1.7 2025-12-04T08:57:44.1081401Z * [new branch] release/1.8 -> origin/release/1.8 2025-12-04T08:57:44.1082482Z * [new branch] release/1.9 -> origin/release/1.9 2025-12-04T08:57:44.1083600Z * [new branch] release/2.0 -> origin/release/2.0 2025-12-04T08:57:44.1085220Z * [new branch] release/2.1 -> origin/release/2.1 2025-12-04T08:57:44.1086469Z * [new branch] release/2.2 -> origin/release/2.2 2025-12-04T08:57:44.1087885Z * [new branch] release/2.3 -> origin/release/2.3 2025-12-04T08:57:44.1089405Z * [new branch] release/2.4 -> origin/release/2.4 2025-12-04T08:57:44.1090946Z * [new branch] release/2.5 -> origin/release/2.5 2025-12-04T08:57:44.1092249Z * [new branch] release/2.6 -> origin/release/2.6 2025-12-04T08:57:44.1093472Z * [new branch] release/2.7 -> origin/release/2.7 2025-12-04T08:57:44.1094635Z * [new branch] release/2.8 -> origin/release/2.8 2025-12-04T08:57:44.1096142Z * [new branch] release/2.9 -> origin/release/2.9 2025-12-04T08:57:44.1097784Z * [new branch] release_notes -> origin/release_notes 2025-12-04T08:57:44.1099010Z * [new branch] remove_pyinterpreter -> origin/remove_pyinterpreter 2025-12-04T08:57:44.1100527Z * [new branch] replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836 2025-12-04T08:57:44.1101490Z * [new branch] replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248 2025-12-04T08:57:44.1102514Z * [new branch] replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324 2025-12-04T08:57:44.1103694Z * [new branch] replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020 2025-12-04T08:57:44.1105948Z * [new branch] revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head 2025-12-04T08:57:44.1108132Z * [new branch] revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head 2025-12-04T08:57:44.1110372Z * [new branch] revert-152361-gh/fadara01/1/head -> origin/revert-152361-gh/fadara01/1/head 2025-12-04T08:57:44.1112946Z * [new branch] revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head 2025-12-04T08:57:44.1114516Z * [new branch] revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ 2025-12-04T08:57:44.1115337Z * [new branch] revert-hoo-invoke-subgraph -> origin/revert-hoo-invoke-subgraph 2025-12-04T08:57:44.1116546Z * [new branch] revert_always_build_distributed -> origin/revert_always_build_distributed 2025-12-04T08:57:44.1117606Z * [new branch] rms_norm_patch -> origin/rms_norm_patch 2025-12-04T08:57:44.1119595Z * [new branch] ruisi/fix_all_to_all_estimation -> origin/ruisi/fix_all_to_all_estimation 2025-12-04T08:57:44.1120735Z * [new branch] ruisi/fix_comm_estimation -> origin/ruisi/fix_comm_estimation 2025-12-04T08:57:44.1122326Z * [new branch] ruisi/fix_dynamic_shape_estimation -> origin/ruisi/fix_dynamic_shape_estimation 2025-12-04T08:57:44.1123184Z * [new branch] ruisi/fix_llama3_autobucketing -> origin/ruisi/fix_llama3_autobucketing 2025-12-04T08:57:44.1124663Z * [new branch] ruisi/fix_manual_bucketing_ep_pass -> origin/ruisi/fix_manual_bucketing_ep_pass 2025-12-04T08:57:44.1126073Z * [new branch] ruisi/manual_bucket_pass -> origin/ruisi/manual_bucket_pass 2025-12-04T08:57:44.1127908Z * [new branch] ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures 2025-12-04T08:57:44.1128764Z * [new branch] ryanguo99/fix-closure-var -> origin/ryanguo99/fix-closure-var 2025-12-04T08:57:44.1130298Z * [new branch] rzou/faketensor_bench -> origin/rzou/faketensor_bench 2025-12-04T08:57:44.1131328Z * [new branch] rzou/njt -> origin/rzou/njt 2025-12-04T08:57:44.1132482Z * [new branch] rzou/pca -> origin/rzou/pca 2025-12-04T08:57:44.1133656Z * [new branch] rzou/realprop -> origin/rzou/realprop 2025-12-04T08:57:44.1134825Z * [new branch] samplevllm -> origin/samplevllm 2025-12-04T08:57:44.1136999Z * [new branch] sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm 2025-12-04T08:57:44.1138202Z * [new branch] sapling-pr-archive-SS-JIA -> origin/sapling-pr-archive-SS-JIA 2025-12-04T08:57:44.1139552Z * [new branch] sapling-pr-archive-tushar00jain -> origin/sapling-pr-archive-tushar00jain 2025-12-04T08:57:44.1140611Z * [new branch] save -> origin/save 2025-12-04T08:57:44.1141889Z * [new branch] scaled_mm -> origin/scaled_mm 2025-12-04T08:57:44.1143084Z * [new branch] scan_attempt -> origin/scan_attempt 2025-12-04T08:57:44.1144559Z * [new branch] sdym/2.5.1 -> origin/sdym/2.5.1 2025-12-04T08:57:44.1145887Z * [new branch] sekyondaMeta-dynamoconfig-fix -> origin/sekyondaMeta-dynamoconfig-fix 2025-12-04T08:57:44.1147269Z * [new branch] shengf/fx-xform-perf -> origin/shengf/fx-xform-perf 2025-12-04T08:57:44.1148525Z * [new branch] shoumikhin-patch-1 -> origin/shoumikhin-patch-1 2025-12-04T08:57:44.1149813Z * [new branch] solve-accuracy-fix -> origin/solve-accuracy-fix 2025-12-04T08:57:44.1150963Z * [new branch] some_rocm_inductor_skips -> origin/some_rocm_inductor_skips 2025-12-04T08:57:44.1152456Z * [new branch] soulitzer/stash-tls-ac -> origin/soulitzer/stash-tls-ac 2025-12-04T08:57:44.1153661Z * [new branch] sparse-mm-bf16-support -> origin/sparse-mm-bf16-support 2025-12-04T08:57:44.1154800Z * [new branch] starterTaskUpdate -> origin/starterTaskUpdate 2025-12-04T08:57:44.1155959Z * [new branch] suo -> origin/suo 2025-12-04T08:57:44.1157075Z * [new branch] sve-poc -> origin/sve-poc 2025-12-04T08:57:44.1158402Z * [new branch] switch-bn -> origin/switch-bn 2025-12-04T08:57:44.1159602Z * [new branch] sy_annotation_in_autograd_hop -> origin/sy_annotation_in_autograd_hop 2025-12-04T08:57:44.1160678Z * [new branch] sy_aot_eager_record -> origin/sy_aot_eager_record 2025-12-04T08:57:44.1162251Z * [new branch] sy_custom_bucketing -> origin/sy_custom_bucketing 2025-12-04T08:57:44.1163406Z * [new branch] sy_debug_mode_test -> origin/sy_debug_mode_test 2025-12-04T08:57:44.1164897Z * [new branch] sy_deserialize -> origin/sy_deserialize 2025-12-04T08:57:44.1165933Z * [new branch] sy_dump_gm_code -> origin/sy_dump_gm_code 2025-12-04T08:57:44.1167039Z * [new branch] sy_exp -> origin/sy_exp 2025-12-04T08:57:44.1168255Z * [new branch] sy_export_annotation -> origin/sy_export_annotation 2025-12-04T08:57:44.1169441Z * [new branch] sy_invoke_subgraph -> origin/sy_invoke_subgraph 2025-12-04T08:57:44.1170986Z * [new branch] sy_kernel_bw_name -> origin/sy_kernel_bw_name 2025-12-04T08:57:44.1172148Z * [new branch] sy_multi_arch -> origin/sy_multi_arch 2025-12-04T08:57:44.1173314Z * [new branch] sy_nn_module_stack -> origin/sy_nn_module_stack 2025-12-04T08:57:44.1174465Z * [new branch] sy_original_dtensor -> origin/sy_original_dtensor 2025-12-04T08:57:44.1175615Z * [new branch] sy_profiler_cia -> origin/sy_profiler_cia 2025-12-04T08:57:44.1177033Z * [new branch] symm_mem_sync -> origin/symm_mem_sync 2025-12-04T08:57:44.1178450Z * [new branch] sympy-bottleneck-repro -> origin/sympy-bottleneck-repro 2025-12-04T08:57:44.1179717Z * [new branch] tensordict_integration -> origin/tensordict_integration 2025-12-04T08:57:44.1181048Z * [new branch] test-move-conda-builds -> origin/test-move-conda-builds 2025-12-04T08:57:44.1182159Z * [new branch] test-old -> origin/test-old 2025-12-04T08:57:44.1184096Z * [new branch] test/bmm_heur -> origin/test/bmm_heur 2025-12-04T08:57:44.1185743Z * [new branch] tianren/customOp_autotune_fix -> origin/tianren/customOp_autotune_fix 2025-12-04T08:57:44.1186936Z * [new branch] tianren/customOp_enable_max_autotune -> origin/tianren/customOp_enable_max_autotune 2025-12-04T08:57:44.1187939Z * [new branch] tianren/customOp_fusion -> origin/tianren/customOp_fusion 2025-12-04T08:57:44.1189266Z * [new branch] tianren/customop_collectiveop_benchmark -> origin/tianren/customop_collectiveop_benchmark 2025-12-04T08:57:44.1190593Z * [new branch] tianren/customop_collectiveop_benchmark_fix -> origin/tianren/customop_collectiveop_benchmark_fix 2025-12-04T08:57:44.1191996Z * [new branch] tianren/customop_dynamic_config -> origin/tianren/customop_dynamic_config 2025-12-04T08:57:44.1193051Z * [new branch] tianren/dynamic_range_input -> origin/tianren/dynamic_range_input 2025-12-04T08:57:44.1194189Z * [new branch] tianren/dynamic_range_input_fix -> origin/tianren/dynamic_range_input_fix 2025-12-04T08:57:44.1195275Z * [new branch] tianren/dynamic_range_input_merge -> origin/tianren/dynamic_range_input_merge 2025-12-04T08:57:44.1196418Z * [new branch] tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp 2025-12-04T08:57:44.1197492Z * [new branch] tianren/fx_codegen_dump -> origin/tianren/fx_codegen_dump 2025-12-04T08:57:44.1198617Z * [new branch] tianren/symmetric_memory -> origin/tianren/symmetric_memory 2025-12-04T08:57:44.1199717Z * [new branch] tianren/test -> origin/tianren/test 2025-12-04T08:57:44.1200920Z * [new branch] tidy_performance_cyy -> origin/tidy_performance_cyy 2025-12-04T08:57:44.1202039Z * [new branch] tmp -> origin/tmp 2025-12-04T08:57:44.1203236Z * [new branch] torchtitan_ep -> origin/torchtitan_ep 2025-12-04T08:57:44.1204457Z * [new branch] torchtitan_integration -> origin/torchtitan_integration 2025-12-04T08:57:44.1205616Z * [new branch] trace_fsdp_torchtune_lora -> origin/trace_fsdp_torchtune_lora 2025-12-04T08:57:44.1206798Z * [new branch] traceable_fsdp_unit_tests -> origin/traceable_fsdp_unit_tests 2025-12-04T08:57:44.1207893Z * [new branch] tree_loop_vec_base -> origin/tree_loop_vec_base 2025-12-04T08:57:44.1209014Z * [new branch] triton_kernel -> origin/triton_kernel 2025-12-04T08:57:44.1210997Z * [new branch] tt_pkg_1908 -> origin/tt_pkg_1908 2025-12-04T08:57:44.1211444Z * [new branch] type_dec -> origin/type_dec 2025-12-04T08:57:44.1213289Z * [new branch] udate-sphinx-dependancies -> origin/udate-sphinx-dependancies 2025-12-04T08:57:44.1214483Z * [new branch] update-audio-commit-hash/17630256502-1803-1 -> origin/update-audio-commit-hash/17630256502-1803-1 2025-12-04T08:57:44.1215410Z * [new branch] update-audio-commit-hash/19087141161-1916-1 -> origin/update-audio-commit-hash/19087141161-1916-1 2025-12-04T08:57:44.1216617Z * [new branch] update-audio-commit-hash/19250643381-1929-1 -> origin/update-audio-commit-hash/19250643381-1929-1 2025-12-04T08:57:44.1218062Z * [new branch] update-audio-commit-hash/19397724337-1935-1 -> origin/update-audio-commit-hash/19397724337-1935-1 2025-12-04T08:57:44.1219108Z * [new branch] update-audio-commit-hash/19555670148-1941-1 -> origin/update-audio-commit-hash/19555670148-1941-1 2025-12-04T08:57:44.1220454Z * [new branch] update-audio-commit-hash/19750627930-1946-1 -> origin/update-audio-commit-hash/19750627930-1946-1 2025-12-04T08:57:44.1222341Z * [new branch] update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2 2025-12-04T08:57:44.1224280Z * [new branch] update-vision-commit-hash/19087141161-1916-1 -> origin/update-vision-commit-hash/19087141161-1916-1 2025-12-04T08:57:44.1225324Z * [new branch] update-vision-commit-hash/19184897099-1925-1 -> origin/update-vision-commit-hash/19184897099-1925-1 2025-12-04T08:57:44.1226336Z * [new branch] update-vision-commit-hash/19250643381-1929-1 -> origin/update-vision-commit-hash/19250643381-1929-1 2025-12-04T08:57:44.1227527Z * [new branch] update-vision-commit-hash/19381328640-1934-1 -> origin/update-vision-commit-hash/19381328640-1934-1 2025-12-04T08:57:44.1228776Z * [new branch] update-vision-commit-hash/19485237164-1938-1 -> origin/update-vision-commit-hash/19485237164-1938-1 2025-12-04T08:57:44.1230327Z * [new branch] update-vllm-commit-hash/18451675449-1879-1 -> origin/update-vllm-commit-hash/18451675449-1879-1 2025-12-04T08:57:44.1231522Z * [new branch] update-vllm-dockerfile -> origin/update-vllm-dockerfile 2025-12-04T08:57:44.1233273Z * [new branch] update-xla-commit-hash/19224287370-211-1 -> origin/update-xla-commit-hash/19224287370-211-1 2025-12-04T08:57:44.1234287Z * [new branch] update-xla-commit-hash/19422028566-212-1 -> origin/update-xla-commit-hash/19422028566-212-1 2025-12-04T08:57:44.1235354Z * [new branch] update-xla-commit-hash/19626841311-213-1 -> origin/update-xla-commit-hash/19626841311-213-1 2025-12-04T08:57:44.1236609Z * [new branch] update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388 2025-12-04T08:57:44.1237656Z * [new branch] update_operator_readme -> origin/update_operator_readme 2025-12-04T08:57:44.1238903Z * [new branch] update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736 2025-12-04T08:57:44.1240046Z * [new branch] update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173 2025-12-04T08:57:44.1241189Z * [new branch] update_slow_tests_1762155677 -> origin/update_slow_tests_1762155677 2025-12-04T08:57:44.1242376Z * [new branch] update_slow_tests_1763365283 -> origin/update_slow_tests_1763365283 2025-12-04T08:57:44.1243684Z * [new branch] update_submodule_FBGEMM -> origin/update_submodule_FBGEMM 2025-12-04T08:57:44.1244667Z * [new branch] update_submodule_kineto -> origin/update_submodule_kineto 2025-12-04T08:57:44.1245862Z * [new branch] update_submodule_tensorpipe -> origin/update_submodule_tensorpipe 2025-12-04T08:57:44.1247030Z * [new branch] upload-tests-for-autorevert -> origin/upload-tests-for-autorevert 2025-12-04T08:57:44.1248228Z * [new branch] v0.1.2 -> origin/v0.1.2 2025-12-04T08:57:44.1249593Z * [new branch] v1.0.1 -> origin/v1.0.1 2025-12-04T08:57:44.1250820Z * [new branch] v1.0.3 -> origin/v1.0.3 2025-12-04T08:57:44.1252017Z * [new branch] v1.1.0 -> origin/v1.1.0 2025-12-04T08:57:44.1253501Z * [new branch] v1.2.0 -> origin/v1.2.0 2025-12-04T08:57:44.1254727Z * [new branch] v1.3.0 -> origin/v1.3.0 2025-12-04T08:57:44.1255948Z * [new branch] v1.3.1 -> origin/v1.3.1 2025-12-04T08:57:44.1257491Z * [new branch] validate_fn -> origin/validate_fn 2025-12-04T08:57:44.1258881Z * [new branch] validations_2.6 -> origin/validations_2.6 2025-12-04T08:57:44.1260184Z * [new branch] validations_2.8 -> origin/validations_2.8 2025-12-04T08:57:44.1261375Z * [new branch] varlen-api -> origin/varlen-api 2025-12-04T08:57:44.1263022Z * [new branch] varlen-api-backup -> origin/varlen-api-backup 2025-12-04T08:57:44.1264221Z * [new branch] varlen_batch_invariance -> origin/varlen_batch_invariance 2025-12-04T08:57:44.1265510Z * [new branch] viable/strict -> origin/viable/strict 2025-12-04T08:57:44.1267288Z * [new branch] vishal9-team/dtensor_parallelism_toy -> origin/vishal9-team/dtensor_parallelism_toy 2025-12-04T08:57:44.1268413Z * [new branch] vllmbuildci -> origin/vllmbuildci 2025-12-04T08:57:44.1269736Z * [new branch] vllmpin -> origin/vllmpin 2025-12-04T08:57:44.1271078Z * [new branch] vscode-recommend-pyrefly -> origin/vscode-recommend-pyrefly 2025-12-04T08:57:44.1272236Z * [new branch] wdvr-patch-1 -> origin/wdvr-patch-1 2025-12-04T08:57:44.1273749Z * [new branch] wdvr/iss_145259 -> origin/wdvr/iss_145259 2025-12-04T08:57:44.1275199Z * [new branch] whc/pei -> origin/whc/pei 2025-12-04T08:57:44.1276302Z * [new branch] whc/pp_fix -> origin/whc/pp_fix 2025-12-04T08:57:44.1277479Z * [new branch] whc/sharding -> origin/whc/sharding 2025-12-04T08:57:44.1278501Z * [new branch] whc/sharding2 -> origin/whc/sharding2 2025-12-04T08:57:44.1279521Z * [new branch] whc/uneven -> origin/whc/uneven 2025-12-04T08:57:44.1280943Z * [new branch] whc/uneven-merge -> origin/whc/uneven-merge 2025-12-04T08:57:44.1282132Z * [new branch] win_warnings -> origin/win_warnings 2025-12-04T08:57:44.1283289Z * [new branch] windows_libtorch_free -> origin/windows_libtorch_free 2025-12-04T08:57:44.1284463Z * [new branch] xmfan-war -> origin/xmfan-war 2025-12-04T08:57:44.1286001Z * [new branch] xmfan/ca_0516 -> origin/xmfan/ca_0516 2025-12-04T08:57:44.1287078Z * [new branch] xmfan/ca_1051b93192 -> origin/xmfan/ca_1051b93192 2025-12-04T08:57:44.1288290Z * [new branch] xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 2025-12-04T08:57:44.1288966Z * [new branch] xmfan/ca_5a2be192d1 -> origin/xmfan/ca_5a2be192d1 2025-12-04T08:57:44.1290109Z * [new branch] xmfan/ca_9d59b516e9 -> origin/xmfan/ca_9d59b516e9 2025-12-04T08:57:44.1290957Z * [new branch] xmfan/ca_apr8 -> origin/xmfan/ca_apr8 2025-12-04T08:57:44.1292193Z * [new branch] xmfan/ca_base -> origin/xmfan/ca_base 2025-12-04T08:57:44.1293511Z * [new branch] xmfan/ca_dynamic -> origin/xmfan/ca_dynamic 2025-12-04T08:57:44.1294874Z * [new branch] xmfan/ca_fix_dyn -> origin/xmfan/ca_fix_dyn 2025-12-04T08:57:44.1295999Z * [new branch] xmfan/ca_fix_lowering -> origin/xmfan/ca_fix_lowering 2025-12-04T08:57:44.1297427Z * [new branch] xmfan/ca_fix_polyfills -> origin/xmfan/ca_fix_polyfills 2025-12-04T08:57:44.1298413Z * [new branch] xmfan/ca_jan3 -> origin/xmfan/ca_jan3 2025-12-04T08:57:44.1299490Z * [new branch] xmfan/ca_jun18 -> origin/xmfan/ca_jun18 2025-12-04T08:57:44.1300586Z * [new branch] xmfan/ca_jun24 -> origin/xmfan/ca_jun24 2025-12-04T08:57:44.1301751Z * [new branch] xmfan/ca_nested -> origin/xmfan/ca_nested 2025-12-04T08:57:44.1302951Z * [new branch] xmfan/ca_overhead -> origin/xmfan/ca_overhead 2025-12-04T08:57:44.1304119Z * [new branch] xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451 2025-12-04T08:57:44.1305115Z * [new branch] xmfan/cacu_jun18 -> origin/xmfan/cacu_jun18 2025-12-04T08:57:44.1306699Z * [new branch] xmfan/cacu_jun19 -> origin/xmfan/cacu_jun19 2025-12-04T08:57:44.1307792Z * [new branch] xmfan/cacu_jun4 -> origin/xmfan/cacu_jun4 2025-12-04T08:57:44.1309030Z * [new branch] xmfan/disable_duck_shape -> origin/xmfan/disable_duck_shape 2025-12-04T08:57:44.1310203Z * [new branch] xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough 2025-12-04T08:57:44.1311494Z * [new branch] xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T08:57:44.1312550Z * [new branch] xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T08:57:44.1313327Z * [new branch] xmfan/single_step -> origin/xmfan/single_step 2025-12-04T08:57:44.1314435Z * [new branch] xmfan/sth_0829 -> origin/xmfan/sth_0829 2025-12-04T08:57:44.1315597Z * [new branch] xmfan/test -> origin/xmfan/test 2025-12-04T08:57:44.1317150Z * [new branch] yguo/debug-0226-constexpr -> origin/yguo/debug-0226-constexpr 2025-12-04T08:57:44.1318138Z * [new branch] yguo/new_latest_changes -> origin/yguo/new_latest_changes 2025-12-04T08:57:44.1319233Z * [new branch] yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes 2025-12-04T08:57:44.1320732Z * [new branch] yiming/bootcamp -> origin/yiming/bootcamp 2025-12-04T08:57:44.1322405Z * [new branch] yiming/run_with_start_end_rng_hop -> origin/yiming/run_with_start_end_rng_hop 2025-12-04T08:57:44.1323569Z * [new branch] yolo-llama3 -> origin/yolo-llama3 2025-12-04T08:57:44.1325098Z * [new branch] zainr/canary-test -> origin/zainr/canary-test 2025-12-04T08:57:44.1326360Z * [new branch] zainr/cleanup-gh-runners -> origin/zainr/cleanup-gh-runners 2025-12-04T08:57:44.1327354Z * [new branch] zainr/pull-migration-c -> origin/zainr/pull-migration-c 2025-12-04T08:57:44.1328340Z * [new branch] zainr/test2 -> origin/zainr/test2 2025-12-04T08:57:44.1329744Z * [new branch] zasdfgbnm-patch-3 -> origin/zasdfgbnm-patch-3 2025-12-04T08:57:44.1331004Z * [new branch] zb2p -> origin/zb2p 2025-12-04T08:57:44.1332148Z * [new branch] zeros-and-scatter-part2 -> origin/zeros-and-scatter-part2 2025-12-04T08:57:44.1334010Z * [new branch] zhxchen17/ci/vllm_lora_oom -> origin/zhxchen17/ci/vllm_lora_oom 2025-12-04T08:57:44.1335129Z * [new branch] zhxchen17/ci/vllm_multimodal_oom -> origin/zhxchen17/ci/vllm_multimodal_oom 2025-12-04T08:57:44.1336211Z * [new branch] zhxchen17/ci/vllm_pin -> origin/zhxchen17/ci/vllm_pin 2025-12-04T08:57:44.1338172Z * [new branch] zhxchen17/dynamo/unsafe_drop_all_guards -> origin/zhxchen17/dynamo/unsafe_drop_all_guards 2025-12-04T08:57:44.1339570Z * [new branch] zhxchen17/export/call_override -> origin/zhxchen17/export/call_override 2025-12-04T08:57:44.1340653Z * [new branch] zhxchen17/export/codemod1 -> origin/zhxchen17/export/codemod1 2025-12-04T08:57:44.1341871Z * [new branch] zhxchen17/export/ctx_return -> origin/zhxchen17/export/ctx_return 2025-12-04T08:57:44.1343205Z * [new branch] zhxchen17/export/disable_side_effect_warn -> origin/zhxchen17/export/disable_side_effect_warn 2025-12-04T08:57:44.1344151Z * [new branch] zhxchen17/export/pytree_check -> origin/zhxchen17/export/pytree_check 2025-12-04T08:57:44.1346144Z * [new branch] zhxchen17/precompile/aoti -> origin/zhxchen17/precompile/aoti 2025-12-04T08:57:44.1347377Z * [new branch] zhxchen17/precompile/globals -> origin/zhxchen17/precompile/globals 2025-12-04T08:57:44.1348682Z * [new branch] zhxchen17/precompile/inductor_guards -> origin/zhxchen17/precompile/inductor_guards 2025-12-04T08:57:44.1349914Z * [new branch] zhxchen17/scratch/0 -> origin/zhxchen17/scratch/0 2025-12-04T08:57:44.1351104Z * [new branch] zhxchen17/torch_export_api_update -> origin/zhxchen17/torch_export_api_update 2025-12-04T08:57:44.1352700Z * [new branch] zhxhcen17/moodycamel -> origin/zhxhcen17/moodycamel 2025-12-04T08:57:44.1354234Z * [new branch] zxiiro/build-times -> origin/zxiiro/build-times 2025-12-04T08:57:44.1355341Z * [new branch] zxiiro/c7i.2xlarge -> origin/zxiiro/c7i.2xlarge 2025-12-04T08:57:44.1356603Z * [new branch] zxiiro/c7i.2xlarge.h100 -> origin/zxiiro/c7i.2xlarge.h100 2025-12-04T08:57:44.1357710Z * [new branch] zxiiro/main -> origin/zxiiro/main 2025-12-04T08:57:44.1359278Z * [new branch] zxiiro/risc64 -> origin/zxiiro/risc64 2025-12-04T08:57:44.1360500Z * [new branch] zxiiro/test-multicloud-arc -> origin/zxiiro/test-multicloud-arc 2025-12-04T08:57:44.1361641Z * [new tag] bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug 2025-12-04T08:57:44.1362312Z * [new tag] ci/binaries/77164 -> ci/binaries/77164 2025-12-04T08:57:44.1363292Z * [new tag] ciflow/b200/115316 -> ciflow/b200/115316 2025-12-04T08:57:44.1363940Z * [new tag] ciflow/b200/160685 -> ciflow/b200/160685 2025-12-04T08:57:44.1364680Z * [new tag] ciflow/b200/161607 -> ciflow/b200/161607 2025-12-04T08:57:44.1365326Z * [new tag] ciflow/b200/161938 -> ciflow/b200/161938 2025-12-04T08:57:44.1366205Z * [new tag] ciflow/b200/167207 -> ciflow/b200/167207 2025-12-04T08:57:44.1366979Z * [new tag] ciflow/b200/167989 -> ciflow/b200/167989 2025-12-04T08:57:44.1367673Z * [new tag] ciflow/b200/168096 -> ciflow/b200/168096 2025-12-04T08:57:44.1368467Z * [new tag] ciflow/b200/168175 -> ciflow/b200/168175 2025-12-04T08:57:44.1369150Z * [new tag] ciflow/b200/168195 -> ciflow/b200/168195 2025-12-04T08:57:44.1370260Z * [new tag] ciflow/b200/169200 -> ciflow/b200/169200 2025-12-04T08:57:44.1370878Z * [new tag] ciflow/b200/169216 -> ciflow/b200/169216 2025-12-04T08:57:44.1372052Z * [new tag] ciflow/b200/169380 -> ciflow/b200/169380 2025-12-04T08:57:44.1373175Z * [new tag] ciflow/b200/169412 -> ciflow/b200/169412 2025-12-04T08:57:44.1374070Z * [new tag] ciflow/b200/169470 -> ciflow/b200/169470 2025-12-04T08:57:44.1375274Z * [new tag] ciflow/b200/169471 -> ciflow/b200/169471 2025-12-04T08:57:44.1375912Z * [new tag] ciflow/b200/169472 -> ciflow/b200/169472 2025-12-04T08:57:44.1377266Z * [new tag] ciflow/b200/169514 -> ciflow/b200/169514 2025-12-04T08:57:44.1377993Z * [new tag] ciflow/b200/169517 -> ciflow/b200/169517 2025-12-04T08:57:44.1379139Z * [new tag] ciflow/binaries/165922 -> ciflow/binaries/165922 2025-12-04T08:57:44.1379807Z * [new tag] ciflow/binaries/169510 -> ciflow/binaries/169510 2025-12-04T08:57:44.1380837Z * [new tag] ciflow/binaries_wheel/157994 -> ciflow/binaries_wheel/157994 2025-12-04T08:57:44.1381618Z * [new tag] ciflow/binaries_wheel/166829 -> ciflow/binaries_wheel/166829 2025-12-04T08:57:44.1382339Z * [new tag] ciflow/binaries_wheel/167972 -> ciflow/binaries_wheel/167972 2025-12-04T08:57:44.1383139Z * [new tag] ciflow/binaries_wheel/167981 -> ciflow/binaries_wheel/167981 2025-12-04T08:57:44.1383948Z * [new tag] ciflow/dynamo/167695 -> ciflow/dynamo/167695 2025-12-04T08:57:44.1384723Z * [new tag] ciflow/dynamo/168096 -> ciflow/dynamo/168096 2025-12-04T08:57:44.1385561Z * [new tag] ciflow/dynamo/169525 -> ciflow/dynamo/169525 2025-12-04T08:57:44.1386617Z * [new tag] ciflow/h100-cutlass-backend/161938 -> ciflow/h100-cutlass-backend/161938 2025-12-04T08:57:44.1387239Z * [new tag] ciflow/h100-cutlass-backend/161940 -> ciflow/h100-cutlass-backend/161940 2025-12-04T08:57:44.1388316Z * [new tag] ciflow/h100-distributed/168923 -> ciflow/h100-distributed/168923 2025-12-04T08:57:44.1389168Z * [new tag] ciflow/h100-symm-mem/167552 -> ciflow/h100-symm-mem/167552 2025-12-04T08:57:44.1389858Z * [new tag] ciflow/h100-symm-mem/168129 -> ciflow/h100-symm-mem/168129 2025-12-04T08:57:44.1390529Z * [new tag] ciflow/h100-symm-mem/168917 -> ciflow/h100-symm-mem/168917 2025-12-04T08:57:44.1391616Z * [new tag] ciflow/h100-symm-mem/169156 -> ciflow/h100-symm-mem/169156 2025-12-04T08:57:44.1392289Z * [new tag] ciflow/h100-symm-mem/169200 -> ciflow/h100-symm-mem/169200 2025-12-04T08:57:44.1393084Z * [new tag] ciflow/h100-symm-mem/169216 -> ciflow/h100-symm-mem/169216 2025-12-04T08:57:44.1393750Z * [new tag] ciflow/h100-symm-mem/169338 -> ciflow/h100-symm-mem/169338 2025-12-04T08:57:44.1394548Z * [new tag] ciflow/h100-symm-mem/169355 -> ciflow/h100-symm-mem/169355 2025-12-04T08:57:44.1395239Z * [new tag] ciflow/h100-symm-mem/169543 -> ciflow/h100-symm-mem/169543 2025-12-04T08:57:44.1396012Z * [new tag] ciflow/h100/115316 -> ciflow/h100/115316 2025-12-04T08:57:44.1396685Z * [new tag] ciflow/h100/160685 -> ciflow/h100/160685 2025-12-04T08:57:44.1397358Z * [new tag] ciflow/h100/160729 -> ciflow/h100/160729 2025-12-04T08:57:44.1398036Z * [new tag] ciflow/h100/161607 -> ciflow/h100/161607 2025-12-04T08:57:44.1398686Z * [new tag] ciflow/h100/161938 -> ciflow/h100/161938 2025-12-04T08:57:44.1399375Z * [new tag] ciflow/h100/167207 -> ciflow/h100/167207 2025-12-04T08:57:44.1400139Z * [new tag] ciflow/h100/167989 -> ciflow/h100/167989 2025-12-04T08:57:44.1400740Z * [new tag] ciflow/h100/168096 -> ciflow/h100/168096 2025-12-04T08:57:44.1401433Z * [new tag] ciflow/h100/168175 -> ciflow/h100/168175 2025-12-04T08:57:44.1402103Z * [new tag] ciflow/h100/168195 -> ciflow/h100/168195 2025-12-04T08:57:44.1402814Z * [new tag] ciflow/h100/168980 -> ciflow/h100/168980 2025-12-04T08:57:44.1403776Z * [new tag] ciflow/h100/169200 -> ciflow/h100/169200 2025-12-04T08:57:44.1404720Z * [new tag] ciflow/h100/169216 -> ciflow/h100/169216 2025-12-04T08:57:44.1405605Z * [new tag] ciflow/h100/169380 -> ciflow/h100/169380 2025-12-04T08:57:44.1406285Z * [new tag] ciflow/h100/169412 -> ciflow/h100/169412 2025-12-04T08:57:44.1407013Z * [new tag] ciflow/h100/169470 -> ciflow/h100/169470 2025-12-04T08:57:44.1407740Z * [new tag] ciflow/h100/169471 -> ciflow/h100/169471 2025-12-04T08:57:44.1408459Z * [new tag] ciflow/h100/169472 -> ciflow/h100/169472 2025-12-04T08:57:44.1409172Z * [new tag] ciflow/h100/169514 -> ciflow/h100/169514 2025-12-04T08:57:44.1410111Z * [new tag] ciflow/inductor-cu126/168096 -> ciflow/inductor-cu126/168096 2025-12-04T08:57:44.1411416Z * [new tag] ciflow/inductor-micro-benchmark-cpu-x86/168096 -> ciflow/inductor-micro-benchmark-cpu-x86/168096 2025-12-04T08:57:44.1412131Z * [new tag] ciflow/inductor-micro-benchmark/166165 -> ciflow/inductor-micro-benchmark/166165 2025-12-04T08:57:44.1412826Z * [new tag] ciflow/inductor-micro-benchmark/168096 -> ciflow/inductor-micro-benchmark/168096 2025-12-04T08:57:44.1413765Z * [new tag] ciflow/inductor-perf-compare/168096 -> ciflow/inductor-perf-compare/168096 2025-12-04T08:57:44.1415147Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/168073 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168073 2025-12-04T08:57:44.1415774Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/168096 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168096 2025-12-04T08:57:44.1416842Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi300/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi300/169024 2025-12-04T08:57:44.1417888Z * [new tag] ciflow/inductor-perf-test-nightly-rocm-mi355/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi355/169024 2025-12-04T08:57:44.1418549Z * [new tag] ciflow/inductor-perf-test-nightly/168096 -> ciflow/inductor-perf-test-nightly/168096 2025-12-04T08:57:44.1419318Z * [new tag] ciflow/inductor-periodic/168096 -> ciflow/inductor-periodic/168096 2025-12-04T08:57:44.1420056Z * [new tag] ciflow/inductor-periodic/169024 -> ciflow/inductor-periodic/169024 2025-12-04T08:57:44.1421078Z * [new tag] ciflow/inductor-periodic/169425 -> ciflow/inductor-periodic/169425 2025-12-04T08:57:44.1425308Z * [new tag] ciflow/inductor-rocm-mi200/165545 -> ciflow/inductor-rocm-mi200/165545 2025-12-04T08:57:44.1426107Z * [new tag] ciflow/inductor-rocm-mi200/165997 -> ciflow/inductor-rocm-mi200/165997 2025-12-04T08:57:44.1426835Z * [new tag] ciflow/inductor-rocm-mi200/168096 -> ciflow/inductor-rocm-mi200/168096 2025-12-04T08:57:44.1427659Z * [new tag] ciflow/inductor-rocm-mi200/169063 -> ciflow/inductor-rocm-mi200/169063 2025-12-04T08:57:44.1428384Z * [new tag] ciflow/inductor-rocm-mi200/169425 -> ciflow/inductor-rocm-mi200/169425 2025-12-04T08:57:44.1429244Z * [new tag] ciflow/inductor-rocm-mi300/165545 -> ciflow/inductor-rocm-mi300/165545 2025-12-04T08:57:44.1430105Z * [new tag] ciflow/inductor-rocm-mi300/168096 -> ciflow/inductor-rocm-mi300/168096 2025-12-04T08:57:44.1430662Z * [new tag] ciflow/inductor-rocm-mi300/169063 -> ciflow/inductor-rocm-mi300/169063 2025-12-04T08:57:44.1431370Z * [new tag] ciflow/inductor-rocm-mi300/169425 -> ciflow/inductor-rocm-mi300/169425 2025-12-04T08:57:44.1432942Z * [new tag] ciflow/inductor-rocm/162052 -> ciflow/inductor-rocm/162052 2025-12-04T08:57:44.1433587Z * [new tag] ciflow/inductor-rocm/168971 -> ciflow/inductor-rocm/168971 2025-12-04T08:57:44.1434477Z * [new tag] ciflow/inductor-windows/168096 -> ciflow/inductor-windows/168096 2025-12-04T08:57:44.1435246Z * [new tag] ciflow/inductor/144542 -> ciflow/inductor/144542 2025-12-04T08:57:44.1435948Z * [new tag] ciflow/inductor/146506 -> ciflow/inductor/146506 2025-12-04T08:57:44.1436770Z * [new tag] ciflow/inductor/147990 -> ciflow/inductor/147990 2025-12-04T08:57:44.1437604Z * [new tag] ciflow/inductor/148294 -> ciflow/inductor/148294 2025-12-04T08:57:44.1438332Z * [new tag] ciflow/inductor/148492 -> ciflow/inductor/148492 2025-12-04T08:57:44.1439017Z * [new tag] ciflow/inductor/157149 -> ciflow/inductor/157149 2025-12-04T08:57:44.1439720Z * [new tag] ciflow/inductor/157994 -> ciflow/inductor/157994 2025-12-04T08:57:44.1440529Z * [new tag] ciflow/inductor/160685 -> ciflow/inductor/160685 2025-12-04T08:57:44.1441245Z * [new tag] ciflow/inductor/160686 -> ciflow/inductor/160686 2025-12-04T08:57:44.1441922Z * [new tag] ciflow/inductor/160687 -> ciflow/inductor/160687 2025-12-04T08:57:44.1442607Z * [new tag] ciflow/inductor/160688 -> ciflow/inductor/160688 2025-12-04T08:57:44.1443625Z * [new tag] ciflow/inductor/160706 -> ciflow/inductor/160706 2025-12-04T08:57:44.1444587Z * [new tag] ciflow/inductor/160729 -> ciflow/inductor/160729 2025-12-04T08:57:44.1445541Z * [new tag] ciflow/inductor/161938 -> ciflow/inductor/161938 2025-12-04T08:57:44.1446190Z * [new tag] ciflow/inductor/161939 -> ciflow/inductor/161939 2025-12-04T08:57:44.1446914Z * [new tag] ciflow/inductor/161940 -> ciflow/inductor/161940 2025-12-04T08:57:44.1447638Z * [new tag] ciflow/inductor/162052 -> ciflow/inductor/162052 2025-12-04T08:57:44.1448385Z * [new tag] ciflow/inductor/162275 -> ciflow/inductor/162275 2025-12-04T08:57:44.1449079Z * [new tag] ciflow/inductor/162795 -> ciflow/inductor/162795 2025-12-04T08:57:44.1450041Z * [new tag] ciflow/inductor/163245 -> ciflow/inductor/163245 2025-12-04T08:57:44.1450722Z * [new tag] ciflow/inductor/163335 -> ciflow/inductor/163335 2025-12-04T08:57:44.1451474Z * [new tag] ciflow/inductor/163503 -> ciflow/inductor/163503 2025-12-04T08:57:44.1452183Z * [new tag] ciflow/inductor/163942 -> ciflow/inductor/163942 2025-12-04T08:57:44.1453083Z * [new tag] ciflow/inductor/165270 -> ciflow/inductor/165270 2025-12-04T08:57:44.1453770Z * [new tag] ciflow/inductor/165274 -> ciflow/inductor/165274 2025-12-04T08:57:44.1454489Z * [new tag] ciflow/inductor/165322 -> ciflow/inductor/165322 2025-12-04T08:57:44.1455221Z * [new tag] ciflow/inductor/165597 -> ciflow/inductor/165597 2025-12-04T08:57:44.1455956Z * [new tag] ciflow/inductor/166063 -> ciflow/inductor/166063 2025-12-04T08:57:44.1456956Z * [new tag] ciflow/inductor/166075 -> ciflow/inductor/166075 2025-12-04T08:57:44.1457731Z * [new tag] ciflow/inductor/166165 -> ciflow/inductor/166165 2025-12-04T08:57:44.1458847Z * [new tag] ciflow/inductor/166254 -> ciflow/inductor/166254 2025-12-04T08:57:44.1459462Z * [new tag] ciflow/inductor/166483 -> ciflow/inductor/166483 2025-12-04T08:57:44.1460216Z * [new tag] ciflow/inductor/166494 -> ciflow/inductor/166494 2025-12-04T08:57:44.1460946Z * [new tag] ciflow/inductor/166545 -> ciflow/inductor/166545 2025-12-04T08:57:44.1461802Z * [new tag] ciflow/inductor/166788 -> ciflow/inductor/166788 2025-12-04T08:57:44.1463197Z * [new tag] ciflow/inductor/166846 -> ciflow/inductor/166846 2025-12-04T08:57:44.1463894Z * [new tag] ciflow/inductor/167300 -> ciflow/inductor/167300 2025-12-04T08:57:44.1464645Z * [new tag] ciflow/inductor/167407 -> ciflow/inductor/167407 2025-12-04T08:57:44.1465574Z * [new tag] ciflow/inductor/167536 -> ciflow/inductor/167536 2025-12-04T08:57:44.1466275Z * [new tag] ciflow/inductor/167552 -> ciflow/inductor/167552 2025-12-04T08:57:44.1467022Z * [new tag] ciflow/inductor/167555 -> ciflow/inductor/167555 2025-12-04T08:57:44.1467969Z * [new tag] ciflow/inductor/167583 -> ciflow/inductor/167583 2025-12-04T08:57:44.1468649Z * [new tag] ciflow/inductor/167599 -> ciflow/inductor/167599 2025-12-04T08:57:44.1469498Z * [new tag] ciflow/inductor/167647 -> ciflow/inductor/167647 2025-12-04T08:57:44.1470232Z * [new tag] ciflow/inductor/167677 -> ciflow/inductor/167677 2025-12-04T08:57:44.1470956Z * [new tag] ciflow/inductor/167680 -> ciflow/inductor/167680 2025-12-04T08:57:44.1471695Z * [new tag] ciflow/inductor/167695 -> ciflow/inductor/167695 2025-12-04T08:57:44.1472431Z * [new tag] ciflow/inductor/167742 -> ciflow/inductor/167742 2025-12-04T08:57:44.1473173Z * [new tag] ciflow/inductor/167768 -> ciflow/inductor/167768 2025-12-04T08:57:44.1474201Z * [new tag] ciflow/inductor/167773 -> ciflow/inductor/167773 2025-12-04T08:57:44.1474867Z * [new tag] ciflow/inductor/167781 -> ciflow/inductor/167781 2025-12-04T08:57:44.1475638Z * [new tag] ciflow/inductor/167880 -> ciflow/inductor/167880 2025-12-04T08:57:44.1476352Z * [new tag] ciflow/inductor/167887 -> ciflow/inductor/167887 2025-12-04T08:57:44.1477079Z * [new tag] ciflow/inductor/167972 -> ciflow/inductor/167972 2025-12-04T08:57:44.1477794Z * [new tag] ciflow/inductor/167989 -> ciflow/inductor/167989 2025-12-04T08:57:44.1478565Z * [new tag] ciflow/inductor/168002 -> ciflow/inductor/168002 2025-12-04T08:57:44.1479245Z * [new tag] ciflow/inductor/168050 -> ciflow/inductor/168050 2025-12-04T08:57:44.1479976Z * [new tag] ciflow/inductor/168051 -> ciflow/inductor/168051 2025-12-04T08:57:44.1480707Z * [new tag] ciflow/inductor/168052 -> ciflow/inductor/168052 2025-12-04T08:57:44.1481420Z * [new tag] ciflow/inductor/168073 -> ciflow/inductor/168073 2025-12-04T08:57:44.1482171Z * [new tag] ciflow/inductor/168096 -> ciflow/inductor/168096 2025-12-04T08:57:44.1482867Z * [new tag] ciflow/inductor/168114 -> ciflow/inductor/168114 2025-12-04T08:57:44.1483587Z * [new tag] ciflow/inductor/168115 -> ciflow/inductor/168115 2025-12-04T08:57:44.1484303Z * [new tag] ciflow/inductor/168127 -> ciflow/inductor/168127 2025-12-04T08:57:44.1485111Z * [new tag] ciflow/inductor/168129 -> ciflow/inductor/168129 2025-12-04T08:57:44.1485827Z * [new tag] ciflow/inductor/168157 -> ciflow/inductor/168157 2025-12-04T08:57:44.1486640Z * [new tag] ciflow/inductor/168175 -> ciflow/inductor/168175 2025-12-04T08:57:44.1487449Z * [new tag] ciflow/inductor/168185 -> ciflow/inductor/168185 2025-12-04T08:57:44.1488117Z * [new tag] ciflow/inductor/168195 -> ciflow/inductor/168195 2025-12-04T08:57:44.1488826Z * [new tag] ciflow/inductor/168209 -> ciflow/inductor/168209 2025-12-04T08:57:44.1489553Z * [new tag] ciflow/inductor/168266 -> ciflow/inductor/168266 2025-12-04T08:57:44.1490276Z * [new tag] ciflow/inductor/168316 -> ciflow/inductor/168316 2025-12-04T08:57:44.1491218Z * [new tag] ciflow/inductor/168326 -> ciflow/inductor/168326 2025-12-04T08:57:44.1491853Z * [new tag] ciflow/inductor/168368 -> ciflow/inductor/168368 2025-12-04T08:57:44.1492592Z * [new tag] ciflow/inductor/168894 -> ciflow/inductor/168894 2025-12-04T08:57:44.1493309Z * [new tag] ciflow/inductor/168934 -> ciflow/inductor/168934 2025-12-04T08:57:44.1494065Z * [new tag] ciflow/inductor/168939 -> ciflow/inductor/168939 2025-12-04T08:57:44.1494758Z * [new tag] ciflow/inductor/168946 -> ciflow/inductor/168946 2025-12-04T08:57:44.1495487Z * [new tag] ciflow/inductor/168950 -> ciflow/inductor/168950 2025-12-04T08:57:44.1496218Z * [new tag] ciflow/inductor/168951 -> ciflow/inductor/168951 2025-12-04T08:57:44.1497283Z * [new tag] ciflow/inductor/168952 -> ciflow/inductor/168952 2025-12-04T08:57:44.1498020Z * [new tag] ciflow/inductor/168955 -> ciflow/inductor/168955 2025-12-04T08:57:44.1498785Z * [new tag] ciflow/inductor/168971 -> ciflow/inductor/168971 2025-12-04T08:57:44.1499536Z * [new tag] ciflow/inductor/168979 -> ciflow/inductor/168979 2025-12-04T08:57:44.1500333Z * [new tag] ciflow/inductor/168980 -> ciflow/inductor/168980 2025-12-04T08:57:44.1501273Z * [new tag] ciflow/inductor/168983 -> ciflow/inductor/168983 2025-12-04T08:57:44.1501978Z * [new tag] ciflow/inductor/169006 -> ciflow/inductor/169006 2025-12-04T08:57:44.1502714Z * [new tag] ciflow/inductor/169023 -> ciflow/inductor/169023 2025-12-04T08:57:44.1503462Z * [new tag] ciflow/inductor/169024 -> ciflow/inductor/169024 2025-12-04T08:57:44.1504240Z * [new tag] ciflow/inductor/169025 -> ciflow/inductor/169025 2025-12-04T08:57:44.1504988Z * [new tag] ciflow/inductor/169066 -> ciflow/inductor/169066 2025-12-04T08:57:44.1505735Z * [new tag] ciflow/inductor/169091 -> ciflow/inductor/169091 2025-12-04T08:57:44.1506479Z * [new tag] ciflow/inductor/169102 -> ciflow/inductor/169102 2025-12-04T08:57:44.1507239Z * [new tag] ciflow/inductor/169103 -> ciflow/inductor/169103 2025-12-04T08:57:44.1507977Z * [new tag] ciflow/inductor/169121 -> ciflow/inductor/169121 2025-12-04T08:57:44.1508720Z * [new tag] ciflow/inductor/169134 -> ciflow/inductor/169134 2025-12-04T08:57:44.1509560Z * [new tag] ciflow/inductor/169135 -> ciflow/inductor/169135 2025-12-04T08:57:44.1510298Z * [new tag] ciflow/inductor/169141 -> ciflow/inductor/169141 2025-12-04T08:57:44.1511227Z * [new tag] ciflow/inductor/169151 -> ciflow/inductor/169151 2025-12-04T08:57:44.1512474Z * [new tag] ciflow/inductor/169161 -> ciflow/inductor/169161 2025-12-04T08:57:44.1513165Z * [new tag] ciflow/inductor/169167 -> ciflow/inductor/169167 2025-12-04T08:57:44.1514109Z * [new tag] ciflow/inductor/169177 -> ciflow/inductor/169177 2025-12-04T08:57:44.1514899Z * [new tag] ciflow/inductor/169185 -> ciflow/inductor/169185 2025-12-04T08:57:44.1515651Z * [new tag] ciflow/inductor/169196 -> ciflow/inductor/169196 2025-12-04T08:57:44.1516447Z * [new tag] ciflow/inductor/169200 -> ciflow/inductor/169200 2025-12-04T08:57:44.1517108Z * [new tag] ciflow/inductor/169204 -> ciflow/inductor/169204 2025-12-04T08:57:44.1517813Z * [new tag] ciflow/inductor/169216 -> ciflow/inductor/169216 2025-12-04T08:57:44.1518743Z * [new tag] ciflow/inductor/169219 -> ciflow/inductor/169219 2025-12-04T08:57:44.1519398Z * [new tag] ciflow/inductor/169220 -> ciflow/inductor/169220 2025-12-04T08:57:44.1520302Z * [new tag] ciflow/inductor/169230 -> ciflow/inductor/169230 2025-12-04T08:57:44.1521135Z * [new tag] ciflow/inductor/169242 -> ciflow/inductor/169242 2025-12-04T08:57:44.1522320Z * [new tag] ciflow/inductor/169245 -> ciflow/inductor/169245 2025-12-04T08:57:44.1523037Z * [new tag] ciflow/inductor/169260 -> ciflow/inductor/169260 2025-12-04T08:57:44.1523801Z * [new tag] ciflow/inductor/169282 -> ciflow/inductor/169282 2025-12-04T08:57:44.1524528Z * [new tag] ciflow/inductor/169286 -> ciflow/inductor/169286 2025-12-04T08:57:44.1525286Z * [new tag] ciflow/inductor/169299 -> ciflow/inductor/169299 2025-12-04T08:57:44.1526232Z * [new tag] ciflow/inductor/169304 -> ciflow/inductor/169304 2025-12-04T08:57:44.1527463Z * [new tag] ciflow/inductor/169305 -> ciflow/inductor/169305 2025-12-04T08:57:44.1528131Z * [new tag] ciflow/inductor/169308 -> ciflow/inductor/169308 2025-12-04T08:57:44.1528904Z * [new tag] ciflow/inductor/169319 -> ciflow/inductor/169319 2025-12-04T08:57:44.1529639Z * [new tag] ciflow/inductor/169326 -> ciflow/inductor/169326 2025-12-04T08:57:44.1530396Z * [new tag] ciflow/inductor/169332 -> ciflow/inductor/169332 2025-12-04T08:57:44.1531147Z * [new tag] ciflow/inductor/169333 -> ciflow/inductor/169333 2025-12-04T08:57:44.1532168Z * [new tag] ciflow/inductor/169336 -> ciflow/inductor/169336 2025-12-04T08:57:44.1532886Z * [new tag] ciflow/inductor/169340 -> ciflow/inductor/169340 2025-12-04T08:57:44.1533790Z * [new tag] ciflow/inductor/169341 -> ciflow/inductor/169341 2025-12-04T08:57:44.1534604Z * [new tag] ciflow/inductor/169343 -> ciflow/inductor/169343 2025-12-04T08:57:44.1535364Z * [new tag] ciflow/inductor/169346 -> ciflow/inductor/169346 2025-12-04T08:57:44.1536265Z * [new tag] ciflow/inductor/169348 -> ciflow/inductor/169348 2025-12-04T08:57:44.1537441Z * [new tag] ciflow/inductor/169350 -> ciflow/inductor/169350 2025-12-04T08:57:44.1538228Z * [new tag] ciflow/inductor/169355 -> ciflow/inductor/169355 2025-12-04T08:57:44.1539034Z * [new tag] ciflow/inductor/169370 -> ciflow/inductor/169370 2025-12-04T08:57:44.1540139Z * [new tag] ciflow/inductor/169375 -> ciflow/inductor/169375 2025-12-04T08:57:44.1540853Z * [new tag] ciflow/inductor/169389 -> ciflow/inductor/169389 2025-12-04T08:57:44.1541598Z * [new tag] ciflow/inductor/169391 -> ciflow/inductor/169391 2025-12-04T08:57:44.1542368Z * [new tag] ciflow/inductor/169393 -> ciflow/inductor/169393 2025-12-04T08:57:44.1543091Z * [new tag] ciflow/inductor/169399 -> ciflow/inductor/169399 2025-12-04T08:57:44.1544032Z * [new tag] ciflow/inductor/169400 -> ciflow/inductor/169400 2025-12-04T08:57:44.1544751Z * [new tag] ciflow/inductor/169415 -> ciflow/inductor/169415 2025-12-04T08:57:44.1545537Z * [new tag] ciflow/inductor/169417 -> ciflow/inductor/169417 2025-12-04T08:57:44.1546427Z * [new tag] ciflow/inductor/169418 -> ciflow/inductor/169418 2025-12-04T08:57:44.1547365Z * [new tag] ciflow/inductor/169430 -> ciflow/inductor/169430 2025-12-04T08:57:44.1548039Z * [new tag] ciflow/inductor/169432 -> ciflow/inductor/169432 2025-12-04T08:57:44.1548819Z * [new tag] ciflow/inductor/169436 -> ciflow/inductor/169436 2025-12-04T08:57:44.1549832Z * [new tag] ciflow/inductor/169437 -> ciflow/inductor/169437 2025-12-04T08:57:44.1550505Z * [new tag] ciflow/inductor/169438 -> ciflow/inductor/169438 2025-12-04T08:57:44.1551222Z * [new tag] ciflow/inductor/169441 -> ciflow/inductor/169441 2025-12-04T08:57:44.1551977Z * [new tag] ciflow/inductor/169446 -> ciflow/inductor/169446 2025-12-04T08:57:44.1552900Z * [new tag] ciflow/inductor/169447 -> ciflow/inductor/169447 2025-12-04T08:57:44.1553580Z * [new tag] ciflow/inductor/169452 -> ciflow/inductor/169452 2025-12-04T08:57:44.1554492Z * [new tag] ciflow/inductor/169455 -> ciflow/inductor/169455 2025-12-04T08:57:44.1555167Z * [new tag] ciflow/inductor/169459 -> ciflow/inductor/169459 2025-12-04T08:57:44.1556112Z * [new tag] ciflow/inductor/169463 -> ciflow/inductor/169463 2025-12-04T08:57:44.1557022Z * [new tag] ciflow/inductor/169476 -> ciflow/inductor/169476 2025-12-04T08:57:44.1557685Z * [new tag] ciflow/inductor/169485 -> ciflow/inductor/169485 2025-12-04T08:57:44.1558434Z * [new tag] ciflow/inductor/169493 -> ciflow/inductor/169493 2025-12-04T08:57:44.1559143Z * [new tag] ciflow/inductor/169496 -> ciflow/inductor/169496 2025-12-04T08:57:44.1559878Z * [new tag] ciflow/inductor/169497 -> ciflow/inductor/169497 2025-12-04T08:57:44.1560607Z * [new tag] ciflow/inductor/169503 -> ciflow/inductor/169503 2025-12-04T08:57:44.1561339Z * [new tag] ciflow/inductor/169504 -> ciflow/inductor/169504 2025-12-04T08:57:44.1562368Z * [new tag] ciflow/inductor/169505 -> ciflow/inductor/169505 2025-12-04T08:57:44.1563644Z * [new tag] ciflow/inductor/169508 -> ciflow/inductor/169508 2025-12-04T08:57:44.1564452Z * [new tag] ciflow/inductor/169509 -> ciflow/inductor/169509 2025-12-04T08:57:44.1565643Z * [new tag] ciflow/inductor/169513 -> ciflow/inductor/169513 2025-12-04T08:57:44.1566379Z * [new tag] ciflow/inductor/169514 -> ciflow/inductor/169514 2025-12-04T08:57:44.1567098Z * [new tag] ciflow/inductor/169515 -> ciflow/inductor/169515 2025-12-04T08:57:44.1567856Z * [new tag] ciflow/inductor/169517 -> ciflow/inductor/169517 2025-12-04T08:57:44.1568585Z * [new tag] ciflow/inductor/169519 -> ciflow/inductor/169519 2025-12-04T08:57:44.1569336Z * [new tag] ciflow/inductor/169520 -> ciflow/inductor/169520 2025-12-04T08:57:44.1570070Z * [new tag] ciflow/inductor/169521 -> ciflow/inductor/169521 2025-12-04T08:57:44.1570787Z * [new tag] ciflow/inductor/169524 -> ciflow/inductor/169524 2025-12-04T08:57:44.1571531Z * [new tag] ciflow/inductor/169527 -> ciflow/inductor/169527 2025-12-04T08:57:44.1572273Z * [new tag] ciflow/inductor/169528 -> ciflow/inductor/169528 2025-12-04T08:57:44.1573199Z * [new tag] ciflow/inductor/169532 -> ciflow/inductor/169532 2025-12-04T08:57:44.1573888Z * [new tag] ciflow/inductor/169535 -> ciflow/inductor/169535 2025-12-04T08:57:44.1574614Z * [new tag] ciflow/inductor/169536 -> ciflow/inductor/169536 2025-12-04T08:57:44.1575359Z * [new tag] ciflow/inductor/169547 -> ciflow/inductor/169547 2025-12-04T08:57:44.1576147Z * [new tag] ciflow/inductor/169548 -> ciflow/inductor/169548 2025-12-04T08:57:44.1577162Z * [new tag] ciflow/inductor/169549 -> ciflow/inductor/169549 2025-12-04T08:57:44.1577937Z * [new tag] ciflow/inductor/169551 -> ciflow/inductor/169551 2025-12-04T08:57:44.1578676Z * [new tag] ciflow/inductor/169552 -> ciflow/inductor/169552 2025-12-04T08:57:44.1579460Z * [new tag] ciflow/inductor/169553 -> ciflow/inductor/169553 2025-12-04T08:57:44.1580573Z * [new tag] ciflow/inductor/3b9a386 -> ciflow/inductor/3b9a386 2025-12-04T08:57:44.1581544Z * [new tag] ciflow/inductor/3d4b92b -> ciflow/inductor/3d4b92b 2025-12-04T08:57:44.1582383Z * [new tag] ciflow/inductor/d224ac7 -> ciflow/inductor/d224ac7 2025-12-04T08:57:44.1583345Z * [new tag] ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994 2025-12-04T08:57:44.1583996Z * [new tag] ciflow/linux-aarch64/166075 -> ciflow/linux-aarch64/166075 2025-12-04T08:57:44.1584692Z * [new tag] ciflow/linux-aarch64/166876 -> ciflow/linux-aarch64/166876 2025-12-04T08:57:44.1585414Z * [new tag] ciflow/linux-aarch64/167981 -> ciflow/linux-aarch64/167981 2025-12-04T08:57:44.1586219Z * [new tag] ciflow/mps/166254 -> ciflow/mps/166254 2025-12-04T08:57:44.1586967Z * [new tag] ciflow/mps/169017 -> ciflow/mps/169017 2025-12-04T08:57:44.1587752Z * [new tag] ciflow/mps/169372 -> ciflow/mps/169372 2025-12-04T08:57:44.1588839Z * [new tag] ciflow/mps/169478 -> ciflow/mps/169478 2025-12-04T08:57:44.1589578Z * [new tag] ciflow/op-benchmark/157994 -> ciflow/op-benchmark/157994 2025-12-04T08:57:44.1590287Z * [new tag] ciflow/op-benchmark/166075 -> ciflow/op-benchmark/166075 2025-12-04T08:57:44.1590950Z * [new tag] ciflow/op-benchmark/169544 -> ciflow/op-benchmark/169544 2025-12-04T08:57:44.1591859Z * [new tag] ciflow/periodic-rocm-mi200/165997 -> ciflow/periodic-rocm-mi200/165997 2025-12-04T08:57:44.1592699Z * [new tag] ciflow/periodic-rocm-mi200/166517 -> ciflow/periodic-rocm-mi200/166517 2025-12-04T08:57:44.1593391Z * [new tag] ciflow/periodic-rocm-mi200/169063 -> ciflow/periodic-rocm-mi200/169063 2025-12-04T08:57:44.1594548Z * [new tag] ciflow/periodic-rocm-mi200/169425 -> ciflow/periodic-rocm-mi200/169425 2025-12-04T08:57:44.1595305Z * [new tag] ciflow/periodic-rocm-mi300/166517 -> ciflow/periodic-rocm-mi300/166517 2025-12-04T08:57:44.1596027Z * [new tag] ciflow/periodic-rocm-mi300/169063 -> ciflow/periodic-rocm-mi300/169063 2025-12-04T08:57:44.1596757Z * [new tag] ciflow/periodic-rocm-mi300/169425 -> ciflow/periodic-rocm-mi300/169425 2025-12-04T08:57:44.1597773Z * [new tag] ciflow/periodic/054a2fd -> ciflow/periodic/054a2fd 2025-12-04T08:57:44.1598869Z * [new tag] ciflow/periodic/167207 -> ciflow/periodic/167207 2025-12-04T08:57:44.1599635Z * [new tag] ciflow/periodic/167978 -> ciflow/periodic/167978 2025-12-04T08:57:44.1600363Z * [new tag] ciflow/periodic/168096 -> ciflow/periodic/168096 2025-12-04T08:57:44.1601279Z * [new tag] ciflow/periodic/169286 -> ciflow/periodic/169286 2025-12-04T08:57:44.1602197Z * [new tag] ciflow/periodic/2a6d37d -> ciflow/periodic/2a6d37d 2025-12-04T08:57:44.1603158Z * [new tag] ciflow/periodic/317eeb8 -> ciflow/periodic/317eeb8 2025-12-04T08:57:44.1603900Z * [new tag] ciflow/periodic/3c32 -> ciflow/periodic/3c32 2025-12-04T08:57:44.1604892Z * [new tag] ciflow/periodic/3e98831 -> ciflow/periodic/3e98831 2025-12-04T08:57:44.1606461Z * [new tag] ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 2025-12-04T08:57:44.1607210Z * [new tag] ciflow/periodic/94512-point -> ciflow/periodic/94512-point 2025-12-04T08:57:44.1608420Z * [new tag] ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519 2025-12-04T08:57:44.1609283Z * [new tag] ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275 2025-12-04T08:57:44.1610163Z * [new tag] ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761 2025-12-04T08:57:44.1611138Z * [new tag] ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12 2025-12-04T08:57:44.1612222Z * [new tag] ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0 2025-12-04T08:57:44.1613325Z * [new tag] ciflow/periodic/sha-ec5b83 -> ciflow/periodic/sha-ec5b83 2025-12-04T08:57:44.1614091Z * [new tag] ciflow/pull/167207 -> ciflow/pull/167207 2025-12-04T08:57:44.1615232Z * [new tag] ciflow/quantization-periodic/169207 -> ciflow/quantization-periodic/169207 2025-12-04T08:57:44.1615929Z * [new tag] ciflow/rocm-mi200/165545 -> ciflow/rocm-mi200/165545 2025-12-04T08:57:44.1616861Z * [new tag] ciflow/rocm-mi200/165997 -> ciflow/rocm-mi200/165997 2025-12-04T08:57:44.1617712Z * [new tag] ciflow/rocm-mi200/168096 -> ciflow/rocm-mi200/168096 2025-12-04T08:57:44.1618642Z * [new tag] ciflow/rocm-mi200/168275 -> ciflow/rocm-mi200/168275 2025-12-04T08:57:44.1619292Z * [new tag] ciflow/rocm-mi200/169063 -> ciflow/rocm-mi200/169063 2025-12-04T08:57:44.1620226Z * [new tag] ciflow/rocm-mi200/169356 -> ciflow/rocm-mi200/169356 2025-12-04T08:57:44.1620991Z * [new tag] ciflow/rocm-mi200/169425 -> ciflow/rocm-mi200/169425 2025-12-04T08:57:44.1622067Z * [new tag] ciflow/rocm-mi300/165545 -> ciflow/rocm-mi300/165545 2025-12-04T08:57:44.1622889Z * [new tag] ciflow/rocm-mi300/167157 -> ciflow/rocm-mi300/167157 2025-12-04T08:57:44.1623586Z * [new tag] ciflow/rocm-mi300/168096 -> ciflow/rocm-mi300/168096 2025-12-04T08:57:44.1624301Z * [new tag] ciflow/rocm-mi300/169063 -> ciflow/rocm-mi300/169063 2025-12-04T08:57:44.1624997Z * [new tag] ciflow/rocm-mi300/169425 -> ciflow/rocm-mi300/169425 2025-12-04T08:57:44.1625927Z * [new tag] ciflow/rocm-mi355/167157 -> ciflow/rocm-mi355/167157 2025-12-04T08:57:44.1626608Z * [new tag] ciflow/rocm-mi355/168275 -> ciflow/rocm-mi355/168275 2025-12-04T08:57:44.1627304Z * [new tag] ciflow/rocm-mi355/169425 -> ciflow/rocm-mi355/169425 2025-12-04T08:57:44.1628327Z * [new tag] ciflow/rocm-navi31/168275 -> ciflow/rocm-navi31/168275 2025-12-04T08:57:44.1628985Z * [new tag] ciflow/rocm-navi31/169425 -> ciflow/rocm-navi31/169425 2025-12-04T08:57:44.1629942Z * [new tag] ciflow/rocm/115316 -> ciflow/rocm/115316 2025-12-04T08:57:44.1630608Z * [new tag] ciflow/rocm/148492 -> ciflow/rocm/148492 2025-12-04T08:57:44.1631291Z * [new tag] ciflow/rocm/160685 -> ciflow/rocm/160685 2025-12-04T08:57:44.1632015Z * [new tag] ciflow/rocm/161607 -> ciflow/rocm/161607 2025-12-04T08:57:44.1632725Z * [new tag] ciflow/rocm/162052 -> ciflow/rocm/162052 2025-12-04T08:57:44.1633641Z * [new tag] ciflow/rocm/165997 -> ciflow/rocm/165997 2025-12-04T08:57:44.1634332Z * [new tag] ciflow/rocm/166165 -> ciflow/rocm/166165 2025-12-04T08:57:44.1635012Z * [new tag] ciflow/rocm/166517 -> ciflow/rocm/166517 2025-12-04T08:57:44.1635674Z * [new tag] ciflow/rocm/167207 -> ciflow/rocm/167207 2025-12-04T08:57:44.1636474Z * [new tag] ciflow/rocm/167536 -> ciflow/rocm/167536 2025-12-04T08:57:44.1637093Z * [new tag] ciflow/rocm/167781 -> ciflow/rocm/167781 2025-12-04T08:57:44.1638059Z * [new tag] ciflow/rocm/167989 -> ciflow/rocm/167989 2025-12-04T08:57:44.1639042Z * [new tag] ciflow/rocm/168073 -> ciflow/rocm/168073 2025-12-04T08:57:44.1639906Z * [new tag] ciflow/rocm/168195 -> ciflow/rocm/168195 2025-12-04T08:57:44.1640622Z * [new tag] ciflow/rocm/168939 -> ciflow/rocm/168939 2025-12-04T08:57:44.1641321Z * [new tag] ciflow/rocm/168971 -> ciflow/rocm/168971 2025-12-04T08:57:44.1642025Z * [new tag] ciflow/rocm/169024 -> ciflow/rocm/169024 2025-12-04T08:57:44.1642757Z * [new tag] ciflow/rocm/169200 -> ciflow/rocm/169200 2025-12-04T08:57:44.1643475Z * [new tag] ciflow/rocm/169216 -> ciflow/rocm/169216 2025-12-04T08:57:44.1644186Z * [new tag] ciflow/rocm/169312 -> ciflow/rocm/169312 2025-12-04T08:57:44.1644890Z * [new tag] ciflow/rocm/169380 -> ciflow/rocm/169380 2025-12-04T08:57:44.1645611Z * [new tag] ciflow/rocm/169427 -> ciflow/rocm/169427 2025-12-04T08:57:44.1646343Z * [new tag] ciflow/rocm/169455 -> ciflow/rocm/169455 2025-12-04T08:57:44.1647047Z * [new tag] ciflow/rocm/169470 -> ciflow/rocm/169470 2025-12-04T08:57:44.1647780Z * [new tag] ciflow/rocm/169471 -> ciflow/rocm/169471 2025-12-04T08:57:44.1648485Z * [new tag] ciflow/rocm/169472 -> ciflow/rocm/169472 2025-12-04T08:57:44.1649206Z * [new tag] ciflow/rocm/169514 -> ciflow/rocm/169514 2025-12-04T08:57:44.1650349Z * [new tag] ciflow/slow/01c7106 -> ciflow/slow/01c7106 2025-12-04T08:57:44.1651131Z * [new tag] ciflow/slow/0577043 -> ciflow/slow/0577043 2025-12-04T08:57:44.1652533Z * [new tag] ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym 2025-12-04T08:57:44.1652996Z * [new tag] ciflow/slow/0e81104 -> ciflow/slow/0e81104 2025-12-04T08:57:44.1654189Z * [new tag] ciflow/slow/167207 -> ciflow/slow/167207 2025-12-04T08:57:44.1654814Z * [new tag] ciflow/slow/168050 -> ciflow/slow/168050 2025-12-04T08:57:44.1655805Z * [new tag] ciflow/slow/1732077 -> ciflow/slow/1732077 2025-12-04T08:57:44.1657004Z * [new tag] ciflow/slow/187eb7c -> ciflow/slow/187eb7c 2025-12-04T08:57:44.1658157Z * [new tag] ciflow/slow/1faef89 -> ciflow/slow/1faef89 2025-12-04T08:57:44.1659450Z * [new tag] ciflow/slow/3920ec1 -> ciflow/slow/3920ec1 2025-12-04T08:57:44.1660502Z * [new tag] ciflow/slow/3b7c6b2 -> ciflow/slow/3b7c6b2 2025-12-04T08:57:44.1661531Z * [new tag] ciflow/slow/59a3759 -> ciflow/slow/59a3759 2025-12-04T08:57:44.1662444Z * [new tag] ciflow/slow/70ef0bb -> ciflow/slow/70ef0bb 2025-12-04T08:57:44.1663263Z * [new tag] ciflow/slow/788ff06 -> ciflow/slow/788ff06 2025-12-04T08:57:44.1664776Z * [new tag] ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym 2025-12-04T08:57:44.1665246Z * [new tag] ciflow/slow/9d85864 -> ciflow/slow/9d85864 2025-12-04T08:57:44.1666203Z * [new tag] ciflow/slow/9ffad5b -> ciflow/slow/9ffad5b 2025-12-04T08:57:44.1667132Z * [new tag] ciflow/slow/a206e8b -> ciflow/slow/a206e8b 2025-12-04T08:57:44.1668146Z * [new tag] ciflow/slow/a837609 -> ciflow/slow/a837609 2025-12-04T08:57:44.1669073Z * [new tag] ciflow/slow/af841f3 -> ciflow/slow/af841f3 2025-12-04T08:57:44.1670566Z * [new tag] ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym 2025-12-04T08:57:44.1671026Z * [new tag] ciflow/torchbench/168175 -> ciflow/torchbench/168175 2025-12-04T08:57:44.1671823Z * [new tag] ciflow/trunk/148492 -> ciflow/trunk/148492 2025-12-04T08:57:44.1672588Z * [new tag] ciflow/trunk/157149 -> ciflow/trunk/157149 2025-12-04T08:57:44.1673282Z * [new tag] ciflow/trunk/157994 -> ciflow/trunk/157994 2025-12-04T08:57:44.1673997Z * [new tag] ciflow/trunk/159718 -> ciflow/trunk/159718 2025-12-04T08:57:44.1674679Z * [new tag] ciflow/trunk/160685 -> ciflow/trunk/160685 2025-12-04T08:57:44.1675366Z * [new tag] ciflow/trunk/160729 -> ciflow/trunk/160729 2025-12-04T08:57:44.1676068Z * [new tag] ciflow/trunk/162275 -> ciflow/trunk/162275 2025-12-04T08:57:44.1676745Z * [new tag] ciflow/trunk/162795 -> ciflow/trunk/162795 2025-12-04T08:57:44.1677447Z * [new tag] ciflow/trunk/163245 -> ciflow/trunk/163245 2025-12-04T08:57:44.1678199Z * [new tag] ciflow/trunk/163942 -> ciflow/trunk/163942 2025-12-04T08:57:44.1678896Z * [new tag] ciflow/trunk/165274 -> ciflow/trunk/165274 2025-12-04T08:57:44.1680053Z * [new tag] ciflow/trunk/165483 -> ciflow/trunk/165483 2025-12-04T08:57:44.1681073Z * [new tag] ciflow/trunk/165728 -> ciflow/trunk/165728 2025-12-04T08:57:44.1681874Z * [new tag] ciflow/trunk/165922 -> ciflow/trunk/165922 2025-12-04T08:57:44.1682592Z * [new tag] ciflow/trunk/166075 -> ciflow/trunk/166075 2025-12-04T08:57:44.1683333Z * [new tag] ciflow/trunk/166165 -> ciflow/trunk/166165 2025-12-04T08:57:44.1684081Z * [new tag] ciflow/trunk/166829 -> ciflow/trunk/166829 2025-12-04T08:57:44.1685081Z * [new tag] ciflow/trunk/166843 -> ciflow/trunk/166843 2025-12-04T08:57:44.1685753Z * [new tag] ciflow/trunk/166876 -> ciflow/trunk/166876 2025-12-04T08:57:44.1686472Z * [new tag] ciflow/trunk/167207 -> ciflow/trunk/167207 2025-12-04T08:57:44.1687404Z * [new tag] ciflow/trunk/167536 -> ciflow/trunk/167536 2025-12-04T08:57:44.1688057Z * [new tag] ciflow/trunk/167552 -> ciflow/trunk/167552 2025-12-04T08:57:44.1688818Z * [new tag] ciflow/trunk/167555 -> ciflow/trunk/167555 2025-12-04T08:57:44.1689515Z * [new tag] ciflow/trunk/167599 -> ciflow/trunk/167599 2025-12-04T08:57:44.1690318Z * [new tag] ciflow/trunk/167659 -> ciflow/trunk/167659 2025-12-04T08:57:44.1691123Z * [new tag] ciflow/trunk/167672 -> ciflow/trunk/167672 2025-12-04T08:57:44.1691832Z * [new tag] ciflow/trunk/167742 -> ciflow/trunk/167742 2025-12-04T08:57:44.1692565Z * [new tag] ciflow/trunk/167781 -> ciflow/trunk/167781 2025-12-04T08:57:44.1693539Z * [new tag] ciflow/trunk/167837 -> ciflow/trunk/167837 2025-12-04T08:57:44.1694193Z * [new tag] ciflow/trunk/167887 -> ciflow/trunk/167887 2025-12-04T08:57:44.1694908Z * [new tag] ciflow/trunk/167978 -> ciflow/trunk/167978 2025-12-04T08:57:44.1695624Z * [new tag] ciflow/trunk/168050 -> ciflow/trunk/168050 2025-12-04T08:57:44.1697519Z * [new tag] ciflow/trunk/168051 -> ciflow/trunk/168051 2025-12-04T08:57:44.1698255Z * [new tag] ciflow/trunk/168096 -> ciflow/trunk/168096 2025-12-04T08:57:44.1698930Z * [new tag] ciflow/trunk/168127 -> ciflow/trunk/168127 2025-12-04T08:57:44.1699671Z * [new tag] ciflow/trunk/168157 -> ciflow/trunk/168157 2025-12-04T08:57:44.1700440Z * [new tag] ciflow/trunk/168175 -> ciflow/trunk/168175 2025-12-04T08:57:44.1701155Z * [new tag] ciflow/trunk/168209 -> ciflow/trunk/168209 2025-12-04T08:57:44.1702097Z * [new tag] ciflow/trunk/168213 -> ciflow/trunk/168213 2025-12-04T08:57:44.1703037Z * [new tag] ciflow/trunk/168226 -> ciflow/trunk/168226 2025-12-04T08:57:44.1703764Z * [new tag] ciflow/trunk/168262 -> ciflow/trunk/168262 2025-12-04T08:57:44.1704518Z * [new tag] ciflow/trunk/168275 -> ciflow/trunk/168275 2025-12-04T08:57:44.1705415Z * [new tag] ciflow/trunk/168328 -> ciflow/trunk/168328 2025-12-04T08:57:44.1706119Z * [new tag] ciflow/trunk/168368 -> ciflow/trunk/168368 2025-12-04T08:57:44.1706920Z * [new tag] ciflow/trunk/168917 -> ciflow/trunk/168917 2025-12-04T08:57:44.1707640Z * [new tag] ciflow/trunk/168933 -> ciflow/trunk/168933 2025-12-04T08:57:44.1708581Z * [new tag] ciflow/trunk/168941 -> ciflow/trunk/168941 2025-12-04T08:57:44.1709359Z * [new tag] ciflow/trunk/168955 -> ciflow/trunk/168955 2025-12-04T08:57:44.1710131Z * [new tag] ciflow/trunk/168980 -> ciflow/trunk/168980 2025-12-04T08:57:44.1711161Z * [new tag] ciflow/trunk/169004 -> ciflow/trunk/169004 2025-12-04T08:57:44.1711848Z * [new tag] ciflow/trunk/169006 -> ciflow/trunk/169006 2025-12-04T08:57:44.1712586Z * [new tag] ciflow/trunk/169023 -> ciflow/trunk/169023 2025-12-04T08:57:44.1713325Z * [new tag] ciflow/trunk/169025 -> ciflow/trunk/169025 2025-12-04T08:57:44.1732301Z * [new tag] ciflow/trunk/169048 -> ciflow/trunk/169048 2025-12-04T08:57:44.1732651Z * [new tag] ciflow/trunk/169066 -> ciflow/trunk/169066 2025-12-04T08:57:44.1732855Z * [new tag] ciflow/trunk/169091 -> ciflow/trunk/169091 2025-12-04T08:57:44.1733063Z * [new tag] ciflow/trunk/169102 -> ciflow/trunk/169102 2025-12-04T08:57:44.1733362Z * [new tag] ciflow/trunk/169103 -> ciflow/trunk/169103 2025-12-04T08:57:44.1733545Z * [new tag] ciflow/trunk/169125 -> ciflow/trunk/169125 2025-12-04T08:57:44.1733740Z * [new tag] ciflow/trunk/169139 -> ciflow/trunk/169139 2025-12-04T08:57:44.1733925Z * [new tag] ciflow/trunk/169148 -> ciflow/trunk/169148 2025-12-04T08:57:44.1734135Z * [new tag] ciflow/trunk/169151 -> ciflow/trunk/169151 2025-12-04T08:57:44.1734327Z * [new tag] ciflow/trunk/169156 -> ciflow/trunk/169156 2025-12-04T08:57:44.1734509Z * [new tag] ciflow/trunk/169176 -> ciflow/trunk/169176 2025-12-04T08:57:44.1734701Z * [new tag] ciflow/trunk/169204 -> ciflow/trunk/169204 2025-12-04T08:57:44.1734886Z * [new tag] ciflow/trunk/169207 -> ciflow/trunk/169207 2025-12-04T08:57:44.1735079Z * [new tag] ciflow/trunk/169211 -> ciflow/trunk/169211 2025-12-04T08:57:44.1735262Z * [new tag] ciflow/trunk/169229 -> ciflow/trunk/169229 2025-12-04T08:57:44.1735447Z * [new tag] ciflow/trunk/169231 -> ciflow/trunk/169231 2025-12-04T08:57:44.1735641Z * [new tag] ciflow/trunk/169260 -> ciflow/trunk/169260 2025-12-04T08:57:44.1735827Z * [new tag] ciflow/trunk/169271 -> ciflow/trunk/169271 2025-12-04T08:57:44.1736516Z * [new tag] ciflow/trunk/169280 -> ciflow/trunk/169280 2025-12-04T08:57:44.1736979Z * [new tag] ciflow/trunk/169281 -> ciflow/trunk/169281 2025-12-04T08:57:44.1737170Z * [new tag] ciflow/trunk/169286 -> ciflow/trunk/169286 2025-12-04T08:57:44.1737372Z * [new tag] ciflow/trunk/169293 -> ciflow/trunk/169293 2025-12-04T08:57:44.1737560Z * [new tag] ciflow/trunk/169296 -> ciflow/trunk/169296 2025-12-04T08:57:44.1737759Z * [new tag] ciflow/trunk/169304 -> ciflow/trunk/169304 2025-12-04T08:57:44.1737961Z * [new tag] ciflow/trunk/169305 -> ciflow/trunk/169305 2025-12-04T08:57:44.1738152Z * [new tag] ciflow/trunk/169312 -> ciflow/trunk/169312 2025-12-04T08:57:44.1738353Z * [new tag] ciflow/trunk/169328 -> ciflow/trunk/169328 2025-12-04T08:57:44.1738546Z * [new tag] ciflow/trunk/169343 -> ciflow/trunk/169343 2025-12-04T08:57:44.1738742Z * [new tag] ciflow/trunk/169355 -> ciflow/trunk/169355 2025-12-04T08:57:44.1738946Z * [new tag] ciflow/trunk/169370 -> ciflow/trunk/169370 2025-12-04T08:57:44.1739203Z * [new tag] ciflow/trunk/169379 -> ciflow/trunk/169379 2025-12-04T08:57:44.1739986Z * [new tag] ciflow/trunk/169380 -> ciflow/trunk/169380 2025-12-04T08:57:44.1740718Z * [new tag] ciflow/trunk/169385 -> ciflow/trunk/169385 2025-12-04T08:57:44.1741493Z * [new tag] ciflow/trunk/169387 -> ciflow/trunk/169387 2025-12-04T08:57:44.1742473Z * [new tag] ciflow/trunk/169410 -> ciflow/trunk/169410 2025-12-04T08:57:44.1743152Z * [new tag] ciflow/trunk/169412 -> ciflow/trunk/169412 2025-12-04T08:57:44.1743916Z * [new tag] ciflow/trunk/169418 -> ciflow/trunk/169418 2025-12-04T08:57:44.1744653Z * [new tag] ciflow/trunk/169423 -> ciflow/trunk/169423 2025-12-04T08:57:44.1745414Z * [new tag] ciflow/trunk/169427 -> ciflow/trunk/169427 2025-12-04T08:57:44.1746157Z * [new tag] ciflow/trunk/169430 -> ciflow/trunk/169430 2025-12-04T08:57:44.1746905Z * [new tag] ciflow/trunk/169437 -> ciflow/trunk/169437 2025-12-04T08:57:44.1747669Z * [new tag] ciflow/trunk/169442 -> ciflow/trunk/169442 2025-12-04T08:57:44.1748577Z * [new tag] ciflow/trunk/169452 -> ciflow/trunk/169452 2025-12-04T08:57:44.1749798Z * [new tag] ciflow/trunk/169454 -> ciflow/trunk/169454 2025-12-04T08:57:44.1750454Z * [new tag] ciflow/trunk/169459 -> ciflow/trunk/169459 2025-12-04T08:57:44.1751380Z * [new tag] ciflow/trunk/169474 -> ciflow/trunk/169474 2025-12-04T08:57:44.1752069Z * [new tag] ciflow/trunk/169475 -> ciflow/trunk/169475 2025-12-04T08:57:44.1752795Z * [new tag] ciflow/trunk/169476 -> ciflow/trunk/169476 2025-12-04T08:57:44.1753688Z * [new tag] ciflow/trunk/169487 -> ciflow/trunk/169487 2025-12-04T08:57:44.1754386Z * [new tag] ciflow/trunk/169497 -> ciflow/trunk/169497 2025-12-04T08:57:44.1755167Z * [new tag] ciflow/trunk/169503 -> ciflow/trunk/169503 2025-12-04T08:57:44.1755888Z * [new tag] ciflow/trunk/169505 -> ciflow/trunk/169505 2025-12-04T08:57:44.1756627Z * [new tag] ciflow/trunk/169507 -> ciflow/trunk/169507 2025-12-04T08:57:44.1757374Z * [new tag] ciflow/trunk/169514 -> ciflow/trunk/169514 2025-12-04T08:57:44.1758086Z * [new tag] ciflow/trunk/169517 -> ciflow/trunk/169517 2025-12-04T08:57:44.1758903Z * [new tag] ciflow/trunk/169519 -> ciflow/trunk/169519 2025-12-04T08:57:44.1759566Z * [new tag] ciflow/trunk/169528 -> ciflow/trunk/169528 2025-12-04T08:57:44.1760294Z * [new tag] ciflow/trunk/169541 -> ciflow/trunk/169541 2025-12-04T08:57:44.1761211Z * [new tag] ciflow/trunk/169555 -> ciflow/trunk/169555 2025-12-04T08:57:44.1762447Z * [new tag] ciflow/unstable/123 -> ciflow/unstable/123 2025-12-04T08:57:44.1763262Z * [new tag] ciflow/vllm/165270 -> ciflow/vllm/165270 2025-12-04T08:57:44.1763941Z * [new tag] ciflow/vllm/165274 -> ciflow/vllm/165274 2025-12-04T08:57:44.1764656Z * [new tag] ciflow/vllm/166494 -> ciflow/vllm/166494 2025-12-04T08:57:44.1765330Z * [new tag] ciflow/vllm/169219 -> ciflow/vllm/169219 2025-12-04T08:57:44.1766009Z * [new tag] ciflow/vllm/169220 -> ciflow/vllm/169220 2025-12-04T08:57:44.1766885Z * [new tag] ciflow/xpu/157994 -> ciflow/xpu/157994 2025-12-04T08:57:44.1767531Z * [new tag] ciflow/xpu/159718 -> ciflow/xpu/159718 2025-12-04T08:57:44.1768470Z * [new tag] ciflow/xpu/161940 -> ciflow/xpu/161940 2025-12-04T08:57:44.1769184Z * [new tag] ciflow/xpu/163251 -> ciflow/xpu/163251 2025-12-04T08:57:44.1769885Z * [new tag] ciflow/xpu/166829 -> ciflow/xpu/166829 2025-12-04T08:57:44.1770563Z * [new tag] ciflow/xpu/166843 -> ciflow/xpu/166843 2025-12-04T08:57:44.1771254Z * [new tag] ciflow/xpu/167972 -> ciflow/xpu/167972 2025-12-04T08:57:44.1771964Z * [new tag] ciflow/xpu/167981 -> ciflow/xpu/167981 2025-12-04T08:57:44.1772651Z * [new tag] ciflow/xpu/168213 -> ciflow/xpu/168213 2025-12-04T08:57:44.1773427Z * [new tag] ciflow/xpu/168262 -> ciflow/xpu/168262 2025-12-04T08:57:44.1774097Z * [new tag] ciflow/xpu/168328 -> ciflow/xpu/168328 2025-12-04T08:57:44.1775126Z * [new tag] ciflow/xpu/168950 -> ciflow/xpu/168950 2025-12-04T08:57:44.1776238Z * [new tag] ciflow/xpu/169039 -> ciflow/xpu/169039 2025-12-04T08:57:44.1777484Z * [new tag] ciflow/xpu/169200 -> ciflow/xpu/169200 2025-12-04T08:57:44.1778203Z * [new tag] ciflow/xpu/169203 -> ciflow/xpu/169203 2025-12-04T08:57:44.1778933Z * [new tag] ciflow/xpu/169229 -> ciflow/xpu/169229 2025-12-04T08:57:44.1779696Z * [new tag] ciflow/xpu/169230 -> ciflow/xpu/169230 2025-12-04T08:57:44.1780420Z * [new tag] ciflow/xpu/169231 -> ciflow/xpu/169231 2025-12-04T08:57:44.1781352Z * [new tag] ciflow/xpu/169241 -> ciflow/xpu/169241 2025-12-04T08:57:44.1782050Z * [new tag] ciflow/xpu/169280 -> ciflow/xpu/169280 2025-12-04T08:57:44.1782781Z * [new tag] ciflow/xpu/169296 -> ciflow/xpu/169296 2025-12-04T08:57:44.1783775Z * [new tag] ciflow/xpu/169353 -> ciflow/xpu/169353 2025-12-04T08:57:44.1784454Z * [new tag] ciflow/xpu/169410 -> ciflow/xpu/169410 2025-12-04T08:57:44.1785207Z * [new tag] ciflow/xpu/169442 -> ciflow/xpu/169442 2025-12-04T08:57:44.1786140Z * [new tag] ciflow/xpu/169555 -> ciflow/xpu/169555 2025-12-04T08:57:44.1787002Z * [new tag] cslpull75 -> cslpull75 2025-12-04T08:57:44.1787765Z * [new tag] cslpull76 -> cslpull76 2025-12-04T08:57:44.1788561Z * [new tag] cslpull77 -> cslpull77 2025-12-04T08:57:44.1789538Z * [new tag] cslpull78 -> cslpull78 2025-12-04T08:57:44.1790667Z * [new tag] cslpull79 -> cslpull79 2025-12-04T08:57:44.1791673Z * [new tag] cslpull80 -> cslpull80 2025-12-04T08:57:44.1792539Z * [new tag] cslpull81 -> cslpull81 2025-12-04T08:57:44.1793273Z * [new tag] cslpull82 -> cslpull82 2025-12-04T08:57:44.1794242Z * [new tag] cslpull83 -> cslpull83 2025-12-04T08:57:44.1794998Z * [new tag] cslpull84 -> cslpull84 2025-12-04T08:57:44.1795842Z * [new tag] cslpull85 -> cslpull85 2025-12-04T08:57:44.1796729Z * [new tag] cslpull86 -> cslpull86 2025-12-04T08:57:44.1797580Z * [new tag] cslpull87 -> cslpull87 2025-12-04T08:57:44.1798458Z * [new tag] cslpull88 -> cslpull88 2025-12-04T08:57:44.1799336Z * [new tag] cslpull89 -> cslpull89 2025-12-04T08:57:44.1799910Z * [new tag] cslpull90 -> cslpull90 2025-12-04T08:57:44.1801235Z * [new tag] cslpull91 -> cslpull91 2025-12-04T08:57:44.1801976Z * [new tag] cslpull92 -> cslpull92 2025-12-04T08:57:44.1802886Z * [new tag] flight_5 -> flight_5 2025-12-04T08:57:44.1803938Z * [new tag] flight_5.1 -> flight_5.1 2025-12-04T08:57:44.1804848Z * [new tag] flight_5.2 -> flight_5.2 2025-12-04T08:57:44.1805708Z * [new tag] flight_5.3 -> flight_5.3 2025-12-04T08:57:44.1806573Z * [new tag] forpull1 -> forpull1 2025-12-04T08:57:44.1807605Z * [new tag] malfet/tag-2ef5611 -> malfet/tag-2ef5611 2025-12-04T08:57:44.1808495Z * [new tag] malfet/tag-317b1a0 -> malfet/tag-317b1a0 2025-12-04T08:57:44.1809244Z * [new tag] malfet/tag-ec6f767 -> malfet/tag-ec6f767 2025-12-04T08:57:44.1810205Z * [new tag] nightly-binary -> nightly-binary 2025-12-04T08:57:44.1811095Z * [new tag] sqzhang_flight4_plus -> sqzhang_flight4_plus 2025-12-04T08:57:44.1812062Z * [new tag] sqzhang_flight_3 -> sqzhang_flight_3 2025-12-04T08:57:44.1813397Z * [new tag] trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 -> trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 2025-12-04T08:57:44.1814162Z * [new tag] trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e -> trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e 2025-12-04T08:57:44.1815491Z * [new tag] trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 -> trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 2025-12-04T08:57:44.1816589Z * [new tag] trunk/07dcc0b83db3211653a38565a24e15acdba75654 -> trunk/07dcc0b83db3211653a38565a24e15acdba75654 2025-12-04T08:57:44.1817827Z * [new tag] trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb -> trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb 2025-12-04T08:57:44.1818636Z * [new tag] trunk/088048f2fea28ff7d450f65c72419ca45780d30b -> trunk/088048f2fea28ff7d450f65c72419ca45780d30b 2025-12-04T08:57:44.1819537Z * [new tag] trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 -> trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 2025-12-04T08:57:44.1820470Z * [new tag] trunk/0b80a4c62b94402844bf221791c096b0035c6d75 -> trunk/0b80a4c62b94402844bf221791c096b0035c6d75 2025-12-04T08:57:44.1821906Z * [new tag] trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 -> trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 2025-12-04T08:57:44.1822830Z * [new tag] trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 -> trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 2025-12-04T08:57:44.1823863Z * [new tag] trunk/135f3753c418a6879b1954904184937b67e61688 -> trunk/135f3753c418a6879b1954904184937b67e61688 2025-12-04T08:57:44.1824724Z * [new tag] trunk/15da21026cb13cd20257dc9e96830db108743c10 -> trunk/15da21026cb13cd20257dc9e96830db108743c10 2025-12-04T08:57:44.1825675Z * [new tag] trunk/166efdad2ac827f30fb02504c6017520257f88ec -> trunk/166efdad2ac827f30fb02504c6017520257f88ec 2025-12-04T08:57:44.1826568Z * [new tag] trunk/174272c15fae553d8488140af931f7d8050a313f -> trunk/174272c15fae553d8488140af931f7d8050a313f 2025-12-04T08:57:44.1827802Z * [new tag] trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 -> trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 2025-12-04T08:57:44.1828642Z * [new tag] trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 -> trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 2025-12-04T08:57:44.1829542Z * [new tag] trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 -> trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 2025-12-04T08:57:44.1830416Z * [new tag] trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 -> trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 2025-12-04T08:57:44.1831318Z * [new tag] trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e -> trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e 2025-12-04T08:57:44.1832153Z * [new tag] trunk/1c87554d74140eaee964ca8b1832cede67f5f520 -> trunk/1c87554d74140eaee964ca8b1832cede67f5f520 2025-12-04T08:57:44.1833167Z * [new tag] trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 -> trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 2025-12-04T08:57:44.1834144Z * [new tag] trunk/1cee47d6ce0a02227185b566593f002dd639ca0c -> trunk/1cee47d6ce0a02227185b566593f002dd639ca0c 2025-12-04T08:57:44.1834929Z * [new tag] trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d -> trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d 2025-12-04T08:57:44.1835845Z * [new tag] trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 -> trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 2025-12-04T08:57:44.1836816Z * [new tag] trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de -> trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de 2025-12-04T08:57:44.1837705Z * [new tag] trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 -> trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 2025-12-04T08:57:44.1838546Z * [new tag] trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 -> trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 2025-12-04T08:57:44.1839420Z * [new tag] trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f -> trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f 2025-12-04T08:57:44.1840432Z * [new tag] trunk/285779b1621cf9f073a062b0889a642d200308d9 -> trunk/285779b1621cf9f073a062b0889a642d200308d9 2025-12-04T08:57:44.1841190Z * [new tag] trunk/2887faaec6295d081580d09fce161201826c6d87 -> trunk/2887faaec6295d081580d09fce161201826c6d87 2025-12-04T08:57:44.1842136Z * [new tag] trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc -> trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc 2025-12-04T08:57:44.1842992Z * [new tag] trunk/29856679769b3dede478767e2fe6cfb51197cb25 -> trunk/29856679769b3dede478767e2fe6cfb51197cb25 2025-12-04T08:57:44.1844044Z * [new tag] trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 -> trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 2025-12-04T08:57:44.1844909Z * [new tag] trunk/2ac3ef882afb23136adc188975f0a8802fc68adf -> trunk/2ac3ef882afb23136adc188975f0a8802fc68adf 2025-12-04T08:57:44.1845660Z * [new tag] trunk/2bec68e73b64715354af076ad309335f943e36cd -> trunk/2bec68e73b64715354af076ad309335f943e36cd 2025-12-04T08:57:44.1846548Z * [new tag] trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 -> trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 2025-12-04T08:57:44.1847481Z * [new tag] trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 -> trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 2025-12-04T08:57:44.1848424Z * [new tag] trunk/2df6058f116a65722a0e03073402feb242572d35 -> trunk/2df6058f116a65722a0e03073402feb242572d35 2025-12-04T08:57:44.1849289Z * [new tag] trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec -> trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec 2025-12-04T08:57:44.1850214Z * [new tag] trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 -> trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 2025-12-04T08:57:44.1851074Z * [new tag] trunk/305168768a95d69c444df5cd334bb774edfe06f1 -> trunk/305168768a95d69c444df5cd334bb774edfe06f1 2025-12-04T08:57:44.1852454Z * [new tag] trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 -> trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 2025-12-04T08:57:44.1853247Z * [new tag] trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 -> trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 2025-12-04T08:57:44.1854203Z * [new tag] trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 -> trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 2025-12-04T08:57:44.1855060Z * [new tag] trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf -> trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf 2025-12-04T08:57:44.1855977Z * [new tag] trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee -> trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee 2025-12-04T08:57:44.1857211Z * [new tag] trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 -> trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 2025-12-04T08:57:44.1858018Z * [new tag] trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 -> trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 2025-12-04T08:57:44.1859195Z * [new tag] trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae -> trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae 2025-12-04T08:57:44.1860067Z * [new tag] trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f -> trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f 2025-12-04T08:57:44.1860961Z * [new tag] trunk/42e9005cda22da3f1c559c3649218cebd671027c -> trunk/42e9005cda22da3f1c559c3649218cebd671027c 2025-12-04T08:57:44.1861885Z * [new tag] trunk/43b94713bbf340d3c124fde02d0f73add4021247 -> trunk/43b94713bbf340d3c124fde02d0f73add4021247 2025-12-04T08:57:44.1862770Z * [new tag] trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c -> trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c 2025-12-04T08:57:44.1863647Z * [new tag] trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a -> trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a 2025-12-04T08:57:44.1864480Z * [new tag] trunk/45d310ad84854dff730c0b12e577d7998d978686 -> trunk/45d310ad84854dff730c0b12e577d7998d978686 2025-12-04T08:57:44.1865743Z * [new tag] trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 -> trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 2025-12-04T08:57:44.1866468Z * [new tag] trunk/481e5ab336275bd3acd5fa8a611b05b4469012af -> trunk/481e5ab336275bd3acd5fa8a611b05b4469012af 2025-12-04T08:57:44.1867475Z * [new tag] trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 -> trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 2025-12-04T08:57:44.1868380Z * [new tag] trunk/49a04d26088acc17d948ddd66920f3e16371e873 -> trunk/49a04d26088acc17d948ddd66920f3e16371e873 2025-12-04T08:57:44.1869391Z * [new tag] trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 -> trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 2025-12-04T08:57:44.1870108Z * [new tag] trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f -> trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f 2025-12-04T08:57:44.1871256Z * [new tag] trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa -> trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa 2025-12-04T08:57:44.1872058Z * [new tag] trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c -> trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c 2025-12-04T08:57:44.1873622Z * [new tag] trunk/4fefb8e7e942386ffac764a41b232241f82bea3a -> trunk/4fefb8e7e942386ffac764a41b232241f82bea3a 2025-12-04T08:57:44.1874495Z * [new tag] trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d -> trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d 2025-12-04T08:57:44.1875396Z * [new tag] trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 -> trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 2025-12-04T08:57:44.1876308Z * [new tag] trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 -> trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 2025-12-04T08:57:44.1877295Z * [new tag] trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a -> trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a 2025-12-04T08:57:44.1878184Z * [new tag] trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 -> trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 2025-12-04T08:57:44.1879064Z * [new tag] trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 -> trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 2025-12-04T08:57:44.1880047Z * [new tag] trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 -> trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 2025-12-04T08:57:44.1880922Z * [new tag] trunk/5634469fda9e5d98869c82c7d03bb08914245f96 -> trunk/5634469fda9e5d98869c82c7d03bb08914245f96 2025-12-04T08:57:44.1881687Z * [new tag] trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc -> trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc 2025-12-04T08:57:44.1882594Z * [new tag] trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 -> trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 2025-12-04T08:57:44.1883480Z * [new tag] trunk/597930f6b568852356ca9795dac76f9e4653adbd -> trunk/597930f6b568852356ca9795dac76f9e4653adbd 2025-12-04T08:57:44.1884230Z * [new tag] trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 -> trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 2025-12-04T08:57:44.1885241Z * [new tag] trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 -> trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 2025-12-04T08:57:44.1886184Z * [new tag] trunk/5a607febc04c3a2b5824c75f3f60307867439a2c -> trunk/5a607febc04c3a2b5824c75f3f60307867439a2c 2025-12-04T08:57:44.1887116Z * [new tag] trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b -> trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b 2025-12-04T08:57:44.1887869Z * [new tag] trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c -> trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c 2025-12-04T08:57:44.1888696Z * [new tag] trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 -> trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 2025-12-04T08:57:44.1889640Z * [new tag] trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 -> trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 2025-12-04T08:57:44.1890611Z * [new tag] trunk/61be54a31dc09b59d99b62176fb935aee0b924ef -> trunk/61be54a31dc09b59d99b62176fb935aee0b924ef 2025-12-04T08:57:44.1891474Z * [new tag] trunk/62d3ccd71484ed6a760d909b41487101bbc65719 -> trunk/62d3ccd71484ed6a760d909b41487101bbc65719 2025-12-04T08:57:44.1892340Z * [new tag] trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b -> trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b 2025-12-04T08:57:44.1893235Z * [new tag] trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a -> trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a 2025-12-04T08:57:44.1894203Z * [new tag] trunk/66004b993744b4106bf8afaba71f3c228a804206 -> trunk/66004b993744b4106bf8afaba71f3c228a804206 2025-12-04T08:57:44.1895092Z * [new tag] trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 -> trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 2025-12-04T08:57:44.1895977Z * [new tag] trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 -> trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 2025-12-04T08:57:44.1897168Z * [new tag] trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d -> trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d 2025-12-04T08:57:44.1898101Z * [new tag] trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b -> trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b 2025-12-04T08:57:44.1899002Z * [new tag] trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 -> trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 2025-12-04T08:57:44.1899910Z * [new tag] trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 -> trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 2025-12-04T08:57:44.1900909Z * [new tag] trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec -> trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec 2025-12-04T08:57:44.1901792Z * [new tag] trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 -> trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 2025-12-04T08:57:44.1902717Z * [new tag] trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d -> trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d 2025-12-04T08:57:44.1903753Z * [new tag] trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a -> trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a 2025-12-04T08:57:44.1904729Z * [new tag] trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e -> trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e 2025-12-04T08:57:44.1905623Z * [new tag] trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 -> trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 2025-12-04T08:57:44.1906508Z * [new tag] trunk/70d797a5fc109b20a517646fcaa819477cd0d485 -> trunk/70d797a5fc109b20a517646fcaa819477cd0d485 2025-12-04T08:57:44.1907386Z * [new tag] trunk/7348cb355ff0a6f79cd4871215aea72185748734 -> trunk/7348cb355ff0a6f79cd4871215aea72185748734 2025-12-04T08:57:44.1908406Z * [new tag] trunk/74fe26a1ebe32931783569f2e762e3c2c974901f -> trunk/74fe26a1ebe32931783569f2e762e3c2c974901f 2025-12-04T08:57:44.1909413Z * [new tag] trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 -> trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 2025-12-04T08:57:44.1910333Z * [new tag] trunk/7741edd4ed665f3988052e260863efb508d61a03 -> trunk/7741edd4ed665f3988052e260863efb508d61a03 2025-12-04T08:57:44.1911270Z * [new tag] trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 -> trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 2025-12-04T08:57:44.1912677Z * [new tag] trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 -> trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 2025-12-04T08:57:44.1914434Z * [new tag] trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 -> trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 2025-12-04T08:57:44.1914863Z * [new tag] trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca -> trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca 2025-12-04T08:57:44.1915595Z * [new tag] trunk/7b7af390ea8541c611d1ce2018a6934188fc197b -> trunk/7b7af390ea8541c611d1ce2018a6934188fc197b 2025-12-04T08:57:44.1916247Z * [new tag] trunk/7ba4680f3755a560af81aa0f688791e367aa3609 -> trunk/7ba4680f3755a560af81aa0f688791e367aa3609 2025-12-04T08:57:44.1917120Z * [new tag] trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b -> trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b 2025-12-04T08:57:44.1917807Z * [new tag] trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 2025-12-04T08:57:44.1918616Z * [new tag] trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 -> trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 2025-12-04T08:57:44.1919519Z * [new tag] trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed -> trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed 2025-12-04T08:57:44.1920392Z * [new tag] trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 -> trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 2025-12-04T08:57:44.1921837Z * [new tag] trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e -> trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e 2025-12-04T08:57:44.1922460Z * [new tag] trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead -> trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead 2025-12-04T08:57:44.1923437Z * [new tag] trunk/81af382128efa094d8702e18f2c133760904c718 -> trunk/81af382128efa094d8702e18f2c133760904c718 2025-12-04T08:57:44.1924694Z * [new tag] trunk/84149583d483e9c973c9a0feda70e4f3964947b0 -> trunk/84149583d483e9c973c9a0feda70e4f3964947b0 2025-12-04T08:57:44.1925915Z * [new tag] trunk/85a315917efe82c24306be805c584ec044951c75 -> trunk/85a315917efe82c24306be805c584ec044951c75 2025-12-04T08:57:44.1926770Z * [new tag] trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece -> trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece 2025-12-04T08:57:44.1927537Z * [new tag] trunk/892640e25aeefa8007c5af837214b4502b6b62a6 -> trunk/892640e25aeefa8007c5af837214b4502b6b62a6 2025-12-04T08:57:44.1928741Z * [new tag] trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 -> trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 2025-12-04T08:57:44.1929545Z * [new tag] trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c -> trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c 2025-12-04T08:57:44.1930490Z * [new tag] trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 -> trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 2025-12-04T08:57:44.1931456Z * [new tag] trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 -> trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 2025-12-04T08:57:44.1932457Z * [new tag] trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca -> trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca 2025-12-04T08:57:44.1933357Z * [new tag] trunk/90b27e7e8352cde97d32ddad24740ef819633f38 -> trunk/90b27e7e8352cde97d32ddad24740ef819633f38 2025-12-04T08:57:44.1934335Z * [new tag] trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 -> trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 2025-12-04T08:57:44.1935148Z * [new tag] trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c -> trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c 2025-12-04T08:57:44.1936001Z * [new tag] trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 -> trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 2025-12-04T08:57:44.1937404Z * [new tag] trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 -> trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 2025-12-04T08:57:44.1938336Z * [new tag] trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa -> trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa 2025-12-04T08:57:44.1939259Z * [new tag] trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d -> trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d 2025-12-04T08:57:44.1940182Z * [new tag] trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 -> trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 2025-12-04T08:57:44.1941118Z * [new tag] trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 -> trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 2025-12-04T08:57:44.1942059Z * [new tag] trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d -> trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d 2025-12-04T08:57:44.1942996Z * [new tag] trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a -> trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a 2025-12-04T08:57:44.1943990Z * [new tag] trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 -> trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 2025-12-04T08:57:44.1945010Z * [new tag] trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 -> trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 2025-12-04T08:57:44.1945933Z * [new tag] trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa -> trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa 2025-12-04T08:57:44.1946812Z * [new tag] trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d -> trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d 2025-12-04T08:57:44.1948076Z * [new tag] trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c -> trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c 2025-12-04T08:57:44.1949002Z * [new tag] trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 -> trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 2025-12-04T08:57:44.1949934Z * [new tag] trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c -> trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c 2025-12-04T08:57:44.1950654Z * [new tag] trunk/a7dc6dab9ad911259d4801c502907e531594db45 -> trunk/a7dc6dab9ad911259d4801c502907e531594db45 2025-12-04T08:57:44.1951622Z * [new tag] trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 -> trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 2025-12-04T08:57:44.1952522Z * [new tag] trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e -> trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e 2025-12-04T08:57:44.1953660Z * [new tag] trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e -> trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e 2025-12-04T08:57:44.1954389Z * [new tag] trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e -> trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e 2025-12-04T08:57:44.1955155Z * [new tag] trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 -> trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 2025-12-04T08:57:44.1956064Z * [new tag] trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 -> trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 2025-12-04T08:57:44.1957045Z * [new tag] trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 -> trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 2025-12-04T08:57:44.1957950Z * [new tag] trunk/b39813b4a04931682b0491adba2138d01d716d99 -> trunk/b39813b4a04931682b0491adba2138d01d716d99 2025-12-04T08:57:44.1958867Z * [new tag] trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 -> trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 2025-12-04T08:57:44.1959795Z * [new tag] trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 -> trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 2025-12-04T08:57:44.1960773Z * [new tag] trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a -> trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a 2025-12-04T08:57:44.1961783Z * [new tag] trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 -> trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 2025-12-04T08:57:44.1962705Z * [new tag] trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 -> trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 2025-12-04T08:57:44.1963638Z * [new tag] trunk/b7d60685f8cbc939b68a20871e90db67e729329b -> trunk/b7d60685f8cbc939b68a20871e90db67e729329b 2025-12-04T08:57:44.1964754Z * [new tag] trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e -> trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e 2025-12-04T08:57:44.1965672Z * [new tag] trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf -> trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf 2025-12-04T08:57:44.1966556Z * [new tag] trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 -> trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 2025-12-04T08:57:44.1967500Z * [new tag] trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f -> trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f 2025-12-04T08:57:44.1968432Z * [new tag] trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f -> trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f 2025-12-04T08:57:44.1969342Z * [new tag] trunk/bb3034198b459401fabeab254e1b99f0115046e2 -> trunk/bb3034198b459401fabeab254e1b99f0115046e2 2025-12-04T08:57:44.1970241Z * [new tag] trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 -> trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 2025-12-04T08:57:44.1971248Z * [new tag] trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 -> trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 2025-12-04T08:57:44.1972467Z * [new tag] trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 -> trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 2025-12-04T08:57:44.1973707Z * [new tag] trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 -> trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 2025-12-04T08:57:44.1974889Z * [new tag] trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 -> trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 2025-12-04T08:57:44.1975767Z * [new tag] trunk/c0660bcee27e7d7731634e274576a7081882bede -> trunk/c0660bcee27e7d7731634e274576a7081882bede 2025-12-04T08:57:44.1976950Z * [new tag] trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac -> trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac 2025-12-04T08:57:44.1978153Z * [new tag] trunk/c55b1e8f61d041ee436d697449eb028931d574fb -> trunk/c55b1e8f61d041ee436d697449eb028931d574fb 2025-12-04T08:57:44.1978910Z * [new tag] trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 -> trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 2025-12-04T08:57:44.1980142Z * [new tag] trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 -> trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 2025-12-04T08:57:44.1981112Z * [new tag] trunk/cc0853af42122f8185321f542616f4474e717f09 -> trunk/cc0853af42122f8185321f542616f4474e717f09 2025-12-04T08:57:44.1981958Z * [new tag] trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 -> trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 2025-12-04T08:57:44.1983009Z * [new tag] trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a -> trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a 2025-12-04T08:57:44.1983928Z * [new tag] trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace -> trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace 2025-12-04T08:57:44.1984828Z * [new tag] trunk/d16447dacaf2420ea175f0c275c75da951f57d39 -> trunk/d16447dacaf2420ea175f0c275c75da951f57d39 2025-12-04T08:57:44.1985745Z * [new tag] trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 -> trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 2025-12-04T08:57:44.1986710Z * [new tag] trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 -> trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 2025-12-04T08:57:44.1987734Z * [new tag] trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf -> trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf 2025-12-04T08:57:44.1988786Z * [new tag] trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 -> trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 2025-12-04T08:57:44.1989624Z * [new tag] trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d -> trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d 2025-12-04T08:57:44.1990544Z * [new tag] trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 -> trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 2025-12-04T08:57:44.1991442Z * [new tag] trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 -> trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 2025-12-04T08:57:44.1992340Z * [new tag] trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e -> trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e 2025-12-04T08:57:44.1993276Z * [new tag] trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a -> trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a 2025-12-04T08:57:44.1994187Z * [new tag] trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b -> trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b 2025-12-04T08:57:44.1995106Z * [new tag] trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec -> trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec 2025-12-04T08:57:44.1996124Z * [new tag] trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf -> trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf 2025-12-04T08:57:44.1996990Z * [new tag] trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd -> trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd 2025-12-04T08:57:44.1997972Z * [new tag] trunk/dd18a75336a4fbd7497955cc5665904724fce889 -> trunk/dd18a75336a4fbd7497955cc5665904724fce889 2025-12-04T08:57:44.1998956Z * [new tag] trunk/ded9bcd61a059bf723e6e84689552962b480ea77 -> trunk/ded9bcd61a059bf723e6e84689552962b480ea77 2025-12-04T08:57:44.2000200Z * [new tag] trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c -> trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c 2025-12-04T08:57:44.2001176Z * [new tag] trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b -> trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b 2025-12-04T08:57:44.2001949Z * [new tag] trunk/e3f24fd73ad74c6e7176687986436956c7c18235 -> trunk/e3f24fd73ad74c6e7176687986436956c7c18235 2025-12-04T08:57:44.2002917Z * [new tag] trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e -> trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e 2025-12-04T08:57:44.2003923Z * [new tag] trunk/ea7035f462a0d2830865ee86c832bd101e1427fc -> trunk/ea7035f462a0d2830865ee86c832bd101e1427fc 2025-12-04T08:57:44.2004734Z * [new tag] trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 -> trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 2025-12-04T08:57:44.2005667Z * [new tag] trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf -> trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf 2025-12-04T08:57:44.2006586Z * [new tag] trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e -> trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e 2025-12-04T08:57:44.2007493Z * [new tag] trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e -> trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e 2025-12-04T08:57:44.2009003Z * [new tag] trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 -> trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 2025-12-04T08:57:44.2009864Z * [new tag] trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 -> trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 2025-12-04T08:57:44.2010793Z * [new tag] trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 -> trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 2025-12-04T08:57:44.2011680Z * [new tag] trunk/f1076f5510920044912247b1abb8760cb820f598 -> trunk/f1076f5510920044912247b1abb8760cb820f598 2025-12-04T08:57:44.2012587Z * [new tag] trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 -> trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 2025-12-04T08:57:44.2013483Z * [new tag] trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 -> trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 2025-12-04T08:57:44.2014406Z * [new tag] trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 -> trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 2025-12-04T08:57:44.2015227Z * [new tag] trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 -> trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 2025-12-04T08:57:44.2016164Z * [new tag] trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 -> trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 2025-12-04T08:57:44.2017433Z * [new tag] trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 -> trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 2025-12-04T08:57:44.2018239Z * [new tag] trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 -> trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 2025-12-04T08:57:44.2019319Z * [new tag] trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b -> trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b 2025-12-04T08:57:44.2020224Z * [new tag] trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 -> trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 2025-12-04T08:57:44.2024960Z * [new tag] trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 -> trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 2025-12-04T08:57:44.2025999Z * [new tag] trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 -> trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 2025-12-04T08:57:44.2027199Z * [new tag] trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 -> trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:57:44.2027847Z * [new tag] v0.1.1 -> v0.1.1 2025-12-04T08:57:44.2028595Z * [new tag] v0.1.10 -> v0.1.10 2025-12-04T08:57:44.2029555Z * [new tag] v0.1.11 -> v0.1.11 2025-12-04T08:57:44.2030493Z * [new tag] v0.1.12 -> v0.1.12 2025-12-04T08:57:44.2031782Z * [new tag] v0.1.2 -> v0.1.2 2025-12-04T08:57:44.2032761Z * [new tag] v0.1.3 -> v0.1.3 2025-12-04T08:57:44.2033457Z * [new tag] v0.1.4 -> v0.1.4 2025-12-04T08:57:44.2034342Z * [new tag] v0.1.5 -> v0.1.5 2025-12-04T08:57:44.2035174Z * [new tag] v0.1.6 -> v0.1.6 2025-12-04T08:57:44.2035899Z * [new tag] v0.1.7 -> v0.1.7 2025-12-04T08:57:44.2036659Z * [new tag] v0.1.8 -> v0.1.8 2025-12-04T08:57:44.2037515Z * [new tag] v0.1.9 -> v0.1.9 2025-12-04T08:57:44.2038374Z * [new tag] v0.2.0 -> v0.2.0 2025-12-04T08:57:44.2039275Z * [new tag] v0.3.0 -> v0.3.0 2025-12-04T08:57:44.2040215Z * [new tag] v0.3.1 -> v0.3.1 2025-12-04T08:57:44.2041045Z * [new tag] v0.4.0 -> v0.4.0 2025-12-04T08:57:44.2041794Z * [new tag] v0.4.1 -> v0.4.1 2025-12-04T08:57:44.2042669Z * [new tag] v1.0.0 -> v1.0.0 2025-12-04T08:57:44.2043561Z * [new tag] v1.0.0a0 -> v1.0.0a0 2025-12-04T08:57:44.2044331Z * [new tag] v1.0.1 -> v1.0.1 2025-12-04T08:57:44.2045233Z * [new tag] v1.0rc0 -> v1.0rc0 2025-12-04T08:57:44.2045947Z * [new tag] v1.0rc1 -> v1.0rc1 2025-12-04T08:57:44.2046717Z * [new tag] v1.1.0 -> v1.1.0 2025-12-04T08:57:44.2047615Z * [new tag] v1.1.0a0 -> v1.1.0a0 2025-12-04T08:57:44.2048692Z * [new tag] v1.10.0 -> v1.10.0 2025-12-04T08:57:44.2049627Z * [new tag] v1.10.0-rc1 -> v1.10.0-rc1 2025-12-04T08:57:44.2050495Z * [new tag] v1.10.0-rc2 -> v1.10.0-rc2 2025-12-04T08:57:44.2051268Z * [new tag] v1.10.0-rc3 -> v1.10.0-rc3 2025-12-04T08:57:44.2052251Z * [new tag] v1.10.1 -> v1.10.1 2025-12-04T08:57:44.2052951Z * [new tag] v1.10.1-rc1 -> v1.10.1-rc1 2025-12-04T08:57:44.2053561Z * [new tag] v1.10.2 -> v1.10.2 2025-12-04T08:57:44.2054223Z * [new tag] v1.10.2-rc1 -> v1.10.2-rc1 2025-12-04T08:57:44.2055158Z * [new tag] v1.11.0 -> v1.11.0 2025-12-04T08:57:44.2056090Z * [new tag] v1.11.0-rc1 -> v1.11.0-rc1 2025-12-04T08:57:44.2057411Z * [new tag] v1.11.0-rc2 -> v1.11.0-rc2 2025-12-04T08:57:44.2058512Z * [new tag] v1.11.0-rc3 -> v1.11.0-rc3 2025-12-04T08:57:44.2059457Z * [new tag] v1.11.0-rc4 -> v1.11.0-rc4 2025-12-04T08:57:44.2060404Z * [new tag] v1.11.0-rc5 -> v1.11.0-rc5 2025-12-04T08:57:44.2061074Z * [new tag] v1.11.0-rc6 -> v1.11.0-rc6 2025-12-04T08:57:44.2061728Z * [new tag] v1.11.0-rc7 -> v1.11.0-rc7 2025-12-04T08:57:44.2062833Z * [new tag] v1.12.0 -> v1.12.0 2025-12-04T08:57:44.2063722Z * [new tag] v1.12.0-rc1 -> v1.12.0-rc1 2025-12-04T08:57:44.2064719Z * [new tag] v1.12.0-rc2 -> v1.12.0-rc2 2025-12-04T08:57:44.2065614Z * [new tag] v1.12.0-rc3 -> v1.12.0-rc3 2025-12-04T08:57:44.2066491Z * [new tag] v1.12.0-rc4 -> v1.12.0-rc4 2025-12-04T08:57:44.2067419Z * [new tag] v1.12.0-rc5 -> v1.12.0-rc5 2025-12-04T08:57:44.2068475Z * [new tag] v1.12.0-rc6 -> v1.12.0-rc6 2025-12-04T08:57:44.2069242Z * [new tag] v1.12.0-rc7 -> v1.12.0-rc7 2025-12-04T08:57:44.2069926Z * [new tag] v1.12.0-rc8 -> v1.12.0-rc8 2025-12-04T08:57:44.2070590Z * [new tag] v1.12.1 -> v1.12.1 2025-12-04T08:57:44.2071624Z * [new tag] v1.12.1-rc1 -> v1.12.1-rc1 2025-12-04T08:57:44.2072540Z * [new tag] v1.12.1-rc2 -> v1.12.1-rc2 2025-12-04T08:57:44.2073517Z * [new tag] v1.12.1-rc3 -> v1.12.1-rc3 2025-12-04T08:57:44.2074434Z * [new tag] v1.12.1-rc4 -> v1.12.1-rc4 2025-12-04T08:57:44.2075018Z * [new tag] v1.12.1-rc5 -> v1.12.1-rc5 2025-12-04T08:57:44.2075998Z * [new tag] v1.13.0 -> v1.13.0 2025-12-04T08:57:44.2076865Z * [new tag] v1.13.0-rc1 -> v1.13.0-rc1 2025-12-04T08:57:44.2077737Z * [new tag] v1.13.0-rc2 -> v1.13.0-rc2 2025-12-04T08:57:44.2078581Z * [new tag] v1.13.0-rc3 -> v1.13.0-rc3 2025-12-04T08:57:44.2079607Z * [new tag] v1.13.0-rc4 -> v1.13.0-rc4 2025-12-04T08:57:44.2080261Z * [new tag] v1.13.0-rc5 -> v1.13.0-rc5 2025-12-04T08:57:44.2080921Z * [new tag] v1.13.0-rc6 -> v1.13.0-rc6 2025-12-04T08:57:44.2081908Z * [new tag] v1.13.1 -> v1.13.1 2025-12-04T08:57:44.2082594Z * [new tag] v1.13.1-rc1 -> v1.13.1-rc1 2025-12-04T08:57:44.2083455Z * [new tag] v1.2.0 -> v1.2.0 2025-12-04T08:57:44.2084308Z * [new tag] v1.2.0a0 -> v1.2.0a0 2025-12-04T08:57:44.2085146Z * [new tag] v1.3.0 -> v1.3.0 2025-12-04T08:57:44.2086098Z * [new tag] v1.3.0a0 -> v1.3.0a0 2025-12-04T08:57:44.2087194Z * [new tag] v1.3.1 -> v1.3.1 2025-12-04T08:57:44.2088026Z * [new tag] v1.4.0 -> v1.4.0 2025-12-04T08:57:44.2088882Z * [new tag] v1.4.0a0 -> v1.4.0a0 2025-12-04T08:57:44.2089524Z * [new tag] v1.4.1 -> v1.4.1 2025-12-04T08:57:44.2090544Z * [new tag] v1.5.0 -> v1.5.0 2025-12-04T08:57:44.2091463Z * [new tag] v1.5.0-rc1 -> v1.5.0-rc1 2025-12-04T08:57:44.2092334Z * [new tag] v1.5.0-rc2 -> v1.5.0-rc2 2025-12-04T08:57:44.2093315Z * [new tag] v1.5.0-rc3 -> v1.5.0-rc3 2025-12-04T08:57:44.2094034Z * [new tag] v1.5.0-rc4 -> v1.5.0-rc4 2025-12-04T08:57:44.2094713Z * [new tag] v1.5.0-rc5 -> v1.5.0-rc5 2025-12-04T08:57:44.2095716Z * [new tag] v1.5.1 -> v1.5.1 2025-12-04T08:57:44.2096459Z * [new tag] v1.5.1-rc1 -> v1.5.1-rc1 2025-12-04T08:57:44.2097393Z * [new tag] v1.6.0 -> v1.6.0 2025-12-04T08:57:44.2098419Z * [new tag] v1.6.0-rc1 -> v1.6.0-rc1 2025-12-04T08:57:44.2099436Z * [new tag] v1.6.0-rc2 -> v1.6.0-rc2 2025-12-04T08:57:44.2100364Z * [new tag] v1.6.0-rc3 -> v1.6.0-rc3 2025-12-04T08:57:44.2101328Z * [new tag] v1.6.0-rc4 -> v1.6.0-rc4 2025-12-04T08:57:44.2102200Z * [new tag] v1.6.0-rc5 -> v1.6.0-rc5 2025-12-04T08:57:44.2103141Z * [new tag] v1.6.0-rc6 -> v1.6.0-rc6 2025-12-04T08:57:44.2103831Z * [new tag] v1.6.0-rc7 -> v1.6.0-rc7 2025-12-04T08:57:44.2104759Z * [new tag] v1.7.0 -> v1.7.0 2025-12-04T08:57:44.2105683Z * [new tag] v1.7.0-rc1 -> v1.7.0-rc1 2025-12-04T08:57:44.2106756Z * [new tag] v1.7.0-rc2 -> v1.7.0-rc2 2025-12-04T08:57:44.2107683Z * [new tag] v1.7.0-rc3 -> v1.7.0-rc3 2025-12-04T08:57:44.2108298Z * [new tag] v1.7.0-rc4 -> v1.7.0-rc4 2025-12-04T08:57:44.2109384Z * [new tag] v1.7.1 -> v1.7.1 2025-12-04T08:57:44.2110433Z * [new tag] v1.7.1-rc1 -> v1.7.1-rc1 2025-12-04T08:57:44.2111374Z * [new tag] v1.7.1-rc2 -> v1.7.1-rc2 2025-12-04T08:57:44.2112010Z * [new tag] v1.7.1-rc3 -> v1.7.1-rc3 2025-12-04T08:57:44.2112973Z * [new tag] v1.8.0 -> v1.8.0 2025-12-04T08:57:44.2113659Z * [new tag] v1.8.0-rc1 -> v1.8.0-rc1 2025-12-04T08:57:44.2114662Z * [new tag] v1.8.0-rc2 -> v1.8.0-rc2 2025-12-04T08:57:44.2115587Z * [new tag] v1.8.0-rc3 -> v1.8.0-rc3 2025-12-04T08:57:44.2116361Z * [new tag] v1.8.0-rc4 -> v1.8.0-rc4 2025-12-04T08:57:44.2117055Z * [new tag] v1.8.0-rc5 -> v1.8.0-rc5 2025-12-04T08:57:44.2117748Z * [new tag] v1.8.1 -> v1.8.1 2025-12-04T08:57:44.2118696Z * [new tag] v1.8.1-rc1 -> v1.8.1-rc1 2025-12-04T08:57:44.2119381Z * [new tag] v1.8.1-rc2 -> v1.8.1-rc2 2025-12-04T08:57:44.2120007Z * [new tag] v1.8.1-rc3 -> v1.8.1-rc3 2025-12-04T08:57:44.2122066Z * [new tag] v1.8.2 -> v1.8.2 2025-12-04T08:57:44.2122667Z * [new tag] v1.8.2-rc1 -> v1.8.2-rc1 2025-12-04T08:57:44.2123601Z * [new tag] v1.9.0 -> v1.9.0 2025-12-04T08:57:44.2124530Z * [new tag] v1.9.0-rc1 -> v1.9.0-rc1 2025-12-04T08:57:44.2125496Z * [new tag] v1.9.0-rc2 -> v1.9.0-rc2 2025-12-04T08:57:44.2126474Z * [new tag] v1.9.0-rc3 -> v1.9.0-rc3 2025-12-04T08:57:44.2127120Z * [new tag] v1.9.0-rc4 -> v1.9.0-rc4 2025-12-04T08:57:44.2128089Z * [new tag] v1.9.1 -> v1.9.1 2025-12-04T08:57:44.2129249Z * [new tag] v1.9.1-rc1 -> v1.9.1-rc1 2025-12-04T08:57:44.2129852Z * [new tag] v1.9.1-rc2 -> v1.9.1-rc2 2025-12-04T08:57:44.2130881Z * [new tag] v2.0.0 -> v2.0.0 2025-12-04T08:57:44.2131780Z * [new tag] v2.0.0-rc1 -> v2.0.0-rc1 2025-12-04T08:57:44.2132715Z * [new tag] v2.0.0-rc2 -> v2.0.0-rc2 2025-12-04T08:57:44.2133799Z * [new tag] v2.0.0-rc3 -> v2.0.0-rc3 2025-12-04T08:57:44.2134681Z * [new tag] v2.0.0-rc4 -> v2.0.0-rc4 2025-12-04T08:57:44.2135604Z * [new tag] v2.0.0-rc5 -> v2.0.0-rc5 2025-12-04T08:57:44.2136254Z * [new tag] v2.0.0-rc6 -> v2.0.0-rc6 2025-12-04T08:57:44.2137615Z * [new tag] v2.0.1 -> v2.0.1 2025-12-04T08:57:44.2138644Z * [new tag] v2.0.1-rc1 -> v2.0.1-rc1 2025-12-04T08:57:44.2139182Z * [new tag] v2.0.1-rc2 -> v2.0.1-rc2 2025-12-04T08:57:44.2140075Z * [new tag] v2.0.1-rc3 -> v2.0.1-rc3 2025-12-04T08:57:44.2140821Z * [new tag] v2.0.1-rc4 -> v2.0.1-rc4 2025-12-04T08:57:44.2142344Z * [new tag] v2.1.0 -> v2.1.0 2025-12-04T08:57:44.2143271Z * [new tag] v2.1.0-rc1 -> v2.1.0-rc1 2025-12-04T08:57:44.2144290Z * [new tag] v2.1.0-rc2 -> v2.1.0-rc2 2025-12-04T08:57:44.2145743Z * [new tag] v2.1.0-rc3 -> v2.1.0-rc3 2025-12-04T08:57:44.2146761Z * [new tag] v2.1.0-rc4 -> v2.1.0-rc4 2025-12-04T08:57:44.2147785Z * [new tag] v2.1.0-rc5 -> v2.1.0-rc5 2025-12-04T08:57:44.2148409Z * [new tag] v2.1.0-rc6 -> v2.1.0-rc6 2025-12-04T08:57:44.2149479Z * [new tag] v2.1.1 -> v2.1.1 2025-12-04T08:57:44.2150555Z * [new tag] v2.1.1-rc1 -> v2.1.1-rc1 2025-12-04T08:57:44.2151505Z * [new tag] v2.1.1-rc2 -> v2.1.1-rc2 2025-12-04T08:57:44.2152511Z * [new tag] v2.1.1-rc3 -> v2.1.1-rc3 2025-12-04T08:57:44.2153433Z * [new tag] v2.1.1-rc4 -> v2.1.1-rc4 2025-12-04T08:57:44.2154303Z * [new tag] v2.1.1-rc5 -> v2.1.1-rc5 2025-12-04T08:57:44.2154892Z * [new tag] v2.1.1-rc6 -> v2.1.1-rc6 2025-12-04T08:57:44.2155805Z * [new tag] v2.1.2 -> v2.1.2 2025-12-04T08:57:44.2156780Z * [new tag] v2.1.2-rc1 -> v2.1.2-rc1 2025-12-04T08:57:44.2157734Z * [new tag] v2.1.2-rc2 -> v2.1.2-rc2 2025-12-04T08:57:44.2158340Z * [new tag] v2.1.2-rc3 -> v2.1.2-rc3 2025-12-04T08:57:44.2159330Z * [new tag] v2.2.0 -> v2.2.0 2025-12-04T08:57:44.2160215Z * [new tag] v2.2.0-rc1 -> v2.2.0-rc1 2025-12-04T08:57:44.2161098Z * [new tag] v2.2.0-rc2 -> v2.2.0-rc2 2025-12-04T08:57:44.2161961Z * [new tag] v2.2.0-rc3 -> v2.2.0-rc3 2025-12-04T08:57:44.2162798Z * [new tag] v2.2.0-rc4 -> v2.2.0-rc4 2025-12-04T08:57:44.2163648Z * [new tag] v2.2.0-rc5 -> v2.2.0-rc5 2025-12-04T08:57:44.2164522Z * [new tag] v2.2.0-rc6 -> v2.2.0-rc6 2025-12-04T08:57:44.2165182Z * [new tag] v2.2.0-rc7 -> v2.2.0-rc7 2025-12-04T08:57:44.2165824Z * [new tag] v2.2.0-rc8 -> v2.2.0-rc8 2025-12-04T08:57:44.2166812Z * [new tag] v2.2.1 -> v2.2.1 2025-12-04T08:57:44.2167763Z * [new tag] v2.2.1-rc1 -> v2.2.1-rc1 2025-12-04T08:57:44.2168375Z * [new tag] v2.2.1-rc2 -> v2.2.1-rc2 2025-12-04T08:57:44.2169080Z * [new tag] v2.2.1-rc3 -> v2.2.1-rc3 2025-12-04T08:57:44.2169733Z * [new tag] v2.2.2 -> v2.2.2 2025-12-04T08:57:44.2170846Z * [new tag] v2.2.2-rc1 -> v2.2.2-rc1 2025-12-04T08:57:44.2171442Z * [new tag] v2.2.2-rc2 -> v2.2.2-rc2 2025-12-04T08:57:44.2172312Z * [new tag] v2.2.2-rc3 -> v2.2.2-rc3 2025-12-04T08:57:44.2173295Z * [new tag] v2.3.0 -> v2.3.0 2025-12-04T08:57:44.2174167Z * [new tag] v2.3.0-rc1 -> v2.3.0-rc1 2025-12-04T08:57:44.2175238Z * [new tag] v2.3.0-rc10 -> v2.3.0-rc10 2025-12-04T08:57:44.2176058Z * [new tag] v2.3.0-rc11 -> v2.3.0-rc11 2025-12-04T08:57:44.2177047Z * [new tag] v2.3.0-rc12 -> v2.3.0-rc12 2025-12-04T08:57:44.2178115Z * [new tag] v2.3.0-rc2 -> v2.3.0-rc2 2025-12-04T08:57:44.2179091Z * [new tag] v2.3.0-rc3 -> v2.3.0-rc3 2025-12-04T08:57:44.2180027Z * [new tag] v2.3.0-rc4 -> v2.3.0-rc4 2025-12-04T08:57:44.2180898Z * [new tag] v2.3.0-rc5 -> v2.3.0-rc5 2025-12-04T08:57:44.2181586Z * [new tag] v2.3.0-rc6 -> v2.3.0-rc6 2025-12-04T08:57:44.2182541Z * [new tag] v2.3.0-rc7 -> v2.3.0-rc7 2025-12-04T08:57:44.2183505Z * [new tag] v2.3.0-rc8 -> v2.3.0-rc8 2025-12-04T08:57:44.2184180Z * [new tag] v2.3.0-rc9 -> v2.3.0-rc9 2025-12-04T08:57:44.2184827Z * [new tag] v2.3.1 -> v2.3.1 2025-12-04T08:57:44.2185832Z * [new tag] v2.3.1-rc1 -> v2.3.1-rc1 2025-12-04T08:57:44.2186788Z * [new tag] v2.3.1-rc2 -> v2.3.1-rc2 2025-12-04T08:57:44.2187729Z * [new tag] v2.3.1-rc3 -> v2.3.1-rc3 2025-12-04T08:57:44.2188786Z * [new tag] v2.4.0 -> v2.4.0 2025-12-04T08:57:44.2189760Z * [new tag] v2.4.0-rc1 -> v2.4.0-rc1 2025-12-04T08:57:44.2190631Z * [new tag] v2.4.0-rc2 -> v2.4.0-rc2 2025-12-04T08:57:44.2191512Z * [new tag] v2.4.0-rc3 -> v2.4.0-rc3 2025-12-04T08:57:44.2192410Z * [new tag] v2.4.0-rc4 -> v2.4.0-rc4 2025-12-04T08:57:44.2193380Z * [new tag] v2.4.0-rc5 -> v2.4.0-rc5 2025-12-04T08:57:44.2194288Z * [new tag] v2.4.0-rc6 -> v2.4.0-rc6 2025-12-04T08:57:44.2195220Z * [new tag] v2.4.0-rc7 -> v2.4.0-rc7 2025-12-04T08:57:44.2196067Z * [new tag] v2.4.0-rc8 -> v2.4.0-rc8 2025-12-04T08:57:44.2197042Z * [new tag] v2.4.0-rc9 -> v2.4.0-rc9 2025-12-04T08:57:44.2197695Z * [new tag] v2.4.1 -> v2.4.1 2025-12-04T08:57:44.2198711Z * [new tag] v2.4.1-rc1 -> v2.4.1-rc1 2025-12-04T08:57:44.2199665Z * [new tag] v2.4.1-rc2 -> v2.4.1-rc2 2025-12-04T08:57:44.2200579Z * [new tag] v2.4.1-rc3 -> v2.4.1-rc3 2025-12-04T08:57:44.2201584Z * [new tag] v2.5.0 -> v2.5.0 2025-12-04T08:57:44.2202880Z * [new tag] v2.5.0-rc1 -> v2.5.0-rc1 2025-12-04T08:57:44.2203543Z * [new tag] v2.5.0-rc10 -> v2.5.0-rc10 2025-12-04T08:57:44.2204502Z * [new tag] v2.5.0-rc2 -> v2.5.0-rc2 2025-12-04T08:57:44.2205349Z * [new tag] v2.5.0-rc3 -> v2.5.0-rc3 2025-12-04T08:57:44.2206262Z * [new tag] v2.5.0-rc4 -> v2.5.0-rc4 2025-12-04T08:57:44.2207141Z * [new tag] v2.5.0-rc5 -> v2.5.0-rc5 2025-12-04T08:57:44.2208170Z * [new tag] v2.5.0-rc6 -> v2.5.0-rc6 2025-12-04T08:57:44.2209073Z * [new tag] v2.5.0-rc7 -> v2.5.0-rc7 2025-12-04T08:57:44.2209976Z * [new tag] v2.5.0-rc8 -> v2.5.0-rc8 2025-12-04T08:57:44.2210934Z * [new tag] v2.5.0-rc9 -> v2.5.0-rc9 2025-12-04T08:57:44.2211512Z * [new tag] v2.5.1 -> v2.5.1 2025-12-04T08:57:44.2212304Z * [new tag] v2.5.1-rc1 -> v2.5.1-rc1 2025-12-04T08:57:44.2212899Z * [new tag] v2.6.0 -> v2.6.0 2025-12-04T08:57:44.2213933Z * [new tag] v2.6.0-rc1 -> v2.6.0-rc1 2025-12-04T08:57:44.2214927Z * [new tag] v2.6.0-rc2 -> v2.6.0-rc2 2025-12-04T08:57:44.2215834Z * [new tag] v2.6.0-rc3 -> v2.6.0-rc3 2025-12-04T08:57:44.2217005Z * [new tag] v2.6.0-rc4 -> v2.6.0-rc4 2025-12-04T08:57:44.2218224Z * [new tag] v2.6.0-rc5 -> v2.6.0-rc5 2025-12-04T08:57:44.2219276Z * [new tag] v2.6.0-rc6 -> v2.6.0-rc6 2025-12-04T08:57:44.2220266Z * [new tag] v2.6.0-rc7 -> v2.6.0-rc7 2025-12-04T08:57:44.2221539Z * [new tag] v2.6.0-rc8 -> v2.6.0-rc8 2025-12-04T08:57:44.2222560Z * [new tag] v2.6.0-rc9 -> v2.6.0-rc9 2025-12-04T08:57:44.2223759Z * [new tag] v2.7.0 -> v2.7.0 2025-12-04T08:57:44.2224698Z * [new tag] v2.7.0-rc1 -> v2.7.0-rc1 2025-12-04T08:57:44.2225363Z * [new tag] v2.7.0-rc10 -> v2.7.0-rc10 2025-12-04T08:57:44.2226470Z * [new tag] v2.7.0-rc2 -> v2.7.0-rc2 2025-12-04T08:57:44.2227474Z * [new tag] v2.7.0-rc3 -> v2.7.0-rc3 2025-12-04T08:57:44.2228419Z * [new tag] v2.7.0-rc4 -> v2.7.0-rc4 2025-12-04T08:57:44.2229360Z * [new tag] v2.7.0-rc5 -> v2.7.0-rc5 2025-12-04T08:57:44.2230253Z * [new tag] v2.7.0-rc6 -> v2.7.0-rc6 2025-12-04T08:57:44.2231219Z * [new tag] v2.7.0-rc7 -> v2.7.0-rc7 2025-12-04T08:57:44.2232397Z * [new tag] v2.7.0-rc8 -> v2.7.0-rc8 2025-12-04T08:57:44.2233570Z * [new tag] v2.7.0-rc9 -> v2.7.0-rc9 2025-12-04T08:57:44.2234242Z * [new tag] v2.7.1 -> v2.7.1 2025-12-04T08:57:44.2235300Z * [new tag] v2.7.1-rc1 -> v2.7.1-rc1 2025-12-04T08:57:44.2236263Z * [new tag] v2.7.1-rc2 -> v2.7.1-rc2 2025-12-04T08:57:44.2237258Z * [new tag] v2.7.1-rc3 -> v2.7.1-rc3 2025-12-04T08:57:44.2238225Z * [new tag] v2.7.1-rc4 -> v2.7.1-rc4 2025-12-04T08:57:44.2239172Z * [new tag] v2.7.1-rc5 -> v2.7.1-rc5 2025-12-04T08:57:44.2239832Z * [new tag] v2.8.0 -> v2.8.0 2025-12-04T08:57:44.2240813Z * [new tag] v2.8.0-rc1 -> v2.8.0-rc1 2025-12-04T08:57:44.2241719Z * [new tag] v2.8.0-rc2 -> v2.8.0-rc2 2025-12-04T08:57:44.2242698Z * [new tag] v2.8.0-rc3 -> v2.8.0-rc3 2025-12-04T08:57:44.2243766Z * [new tag] v2.8.0-rc4 -> v2.8.0-rc4 2025-12-04T08:57:44.2244705Z * [new tag] v2.8.0-rc5 -> v2.8.0-rc5 2025-12-04T08:57:44.2245659Z * [new tag] v2.8.0-rc6 -> v2.8.0-rc6 2025-12-04T08:57:44.2246606Z * [new tag] v2.8.0-rc7 -> v2.8.0-rc7 2025-12-04T08:57:44.2247505Z * [new tag] v2.8.0-rc8 -> v2.8.0-rc8 2025-12-04T08:57:44.2248501Z * [new tag] v2.9.0 -> v2.9.0 2025-12-04T08:57:44.2249461Z * [new tag] v2.9.0-rc1 -> v2.9.0-rc1 2025-12-04T08:57:44.2250419Z * [new tag] v2.9.0-rc10 -> v2.9.0-rc10 2025-12-04T08:57:44.2251393Z * [new tag] v2.9.0-rc11 -> v2.9.0-rc11 2025-12-04T08:57:44.2252710Z * [new tag] v2.9.0-rc2 -> v2.9.0-rc2 2025-12-04T08:57:44.2253664Z * [new tag] v2.9.0-rc3 -> v2.9.0-rc3 2025-12-04T08:57:44.2254481Z * [new tag] v2.9.0-rc4 -> v2.9.0-rc4 2025-12-04T08:57:44.2255443Z * [new tag] v2.9.0-rc5 -> v2.9.0-rc5 2025-12-04T08:57:44.2256912Z * [new tag] v2.9.0-rc6 -> v2.9.0-rc6 2025-12-04T08:57:44.2257972Z * [new tag] v2.9.0-rc7 -> v2.9.0-rc7 2025-12-04T08:57:44.2259164Z * [new tag] v2.9.0-rc8 -> v2.9.0-rc8 2025-12-04T08:57:44.2259864Z * [new tag] v2.9.0-rc9 -> v2.9.0-rc9 2025-12-04T08:57:44.2260571Z * [new tag] v2.9.1 -> v2.9.1 2025-12-04T08:57:44.2261638Z * [new tag] v2.9.1-rc1 -> v2.9.1-rc1 2025-12-04T08:57:44.2262605Z * [new tag] v2.9.1-rc2 -> v2.9.1-rc2 2025-12-04T08:57:44.2263978Z * [new tag] viable/strict/1759343184 -> viable/strict/1759343184 2025-12-04T08:57:44.2264852Z * [new tag] viable/strict/1759346540 -> viable/strict/1759346540 2025-12-04T08:57:44.2265700Z * [new tag] viable/strict/1759348181 -> viable/strict/1759348181 2025-12-04T08:57:44.2266646Z * [new tag] viable/strict/1759350324 -> viable/strict/1759350324 2025-12-04T08:57:44.2267476Z * [new tag] viable/strict/1759351793 -> viable/strict/1759351793 2025-12-04T08:57:44.2268427Z * [new tag] viable/strict/1759353844 -> viable/strict/1759353844 2025-12-04T08:57:44.2269317Z * [new tag] viable/strict/1759355374 -> viable/strict/1759355374 2025-12-04T08:57:44.2270216Z * [new tag] viable/strict/1759357472 -> viable/strict/1759357472 2025-12-04T08:57:44.2270991Z * [new tag] viable/strict/1759361002 -> viable/strict/1759361002 2025-12-04T08:57:44.2272146Z * [new tag] viable/strict/1759362585 -> viable/strict/1759362585 2025-12-04T08:57:44.2273192Z * [new tag] viable/strict/1759365359 -> viable/strict/1759365359 2025-12-04T08:57:44.2274508Z * [new tag] viable/strict/1759370089 -> viable/strict/1759370089 2025-12-04T08:57:44.2275439Z * [new tag] viable/strict/1759377554 -> viable/strict/1759377554 2025-12-04T08:57:44.2276444Z * [new tag] viable/strict/1759379133 -> viable/strict/1759379133 2025-12-04T08:57:44.2277360Z * [new tag] viable/strict/1759389871 -> viable/strict/1759389871 2025-12-04T08:57:44.2278195Z * [new tag] viable/strict/1759393562 -> viable/strict/1759393562 2025-12-04T08:57:44.2279131Z * [new tag] viable/strict/1759395076 -> viable/strict/1759395076 2025-12-04T08:57:44.2280124Z * [new tag] viable/strict/1759398579 -> viable/strict/1759398579 2025-12-04T08:57:44.2280950Z * [new tag] viable/strict/1759404142 -> viable/strict/1759404142 2025-12-04T08:57:44.2281886Z * [new tag] viable/strict/1759405773 -> viable/strict/1759405773 2025-12-04T08:57:44.2282706Z * [new tag] viable/strict/1759408041 -> viable/strict/1759408041 2025-12-04T08:57:44.2283615Z * [new tag] viable/strict/1759411593 -> viable/strict/1759411593 2025-12-04T08:57:44.2284425Z * [new tag] viable/strict/1759427395 -> viable/strict/1759427395 2025-12-04T08:57:44.2285399Z * [new tag] viable/strict/1759434582 -> viable/strict/1759434582 2025-12-04T08:57:44.2286322Z * [new tag] viable/strict/1759436720 -> viable/strict/1759436720 2025-12-04T08:57:44.2287233Z * [new tag] viable/strict/1759440219 -> viable/strict/1759440219 2025-12-04T08:57:44.2288044Z * [new tag] viable/strict/1759441948 -> viable/strict/1759441948 2025-12-04T08:57:44.2289068Z * [new tag] viable/strict/1759443860 -> viable/strict/1759443860 2025-12-04T08:57:44.2289789Z * [new tag] viable/strict/1759445377 -> viable/strict/1759445377 2025-12-04T08:57:44.2290808Z * [new tag] viable/strict/1759447415 -> viable/strict/1759447415 2025-12-04T08:57:44.2291766Z * [new tag] viable/strict/1759451750 -> viable/strict/1759451750 2025-12-04T08:57:44.2292725Z * [new tag] viable/strict/1759453910 -> viable/strict/1759453910 2025-12-04T08:57:44.2293548Z * [new tag] viable/strict/1759456483 -> viable/strict/1759456483 2025-12-04T08:57:44.2294529Z * [new tag] viable/strict/1759459279 -> viable/strict/1759459279 2025-12-04T08:57:44.2295449Z * [new tag] viable/strict/1759460742 -> viable/strict/1759460742 2025-12-04T08:57:44.2296406Z * [new tag] viable/strict/1759462025 -> viable/strict/1759462025 2025-12-04T08:57:44.2297663Z * [new tag] viable/strict/1759469086 -> viable/strict/1759469086 2025-12-04T08:57:44.2298407Z * [new tag] viable/strict/1759470581 -> viable/strict/1759470581 2025-12-04T08:57:44.2299388Z * [new tag] viable/strict/1759472786 -> viable/strict/1759472786 2025-12-04T08:57:44.2300345Z * [new tag] viable/strict/1759476294 -> viable/strict/1759476294 2025-12-04T08:57:44.2301171Z * [new tag] viable/strict/1759479963 -> viable/strict/1759479963 2025-12-04T08:57:44.2302116Z * [new tag] viable/strict/1759492177 -> viable/strict/1759492177 2025-12-04T08:57:44.2303049Z * [new tag] viable/strict/1759519278 -> viable/strict/1759519278 2025-12-04T08:57:44.2303997Z * [new tag] viable/strict/1759524580 -> viable/strict/1759524580 2025-12-04T08:57:44.2304802Z * [new tag] viable/strict/1759528193 -> viable/strict/1759528193 2025-12-04T08:57:44.2306007Z * [new tag] viable/strict/1759533797 -> viable/strict/1759533797 2025-12-04T08:57:44.2306858Z * [new tag] viable/strict/1759542780 -> viable/strict/1759542780 2025-12-04T08:57:44.2307848Z * [new tag] viable/strict/1759549779 -> viable/strict/1759549779 2025-12-04T08:57:44.2308895Z * [new tag] viable/strict/1759555455 -> viable/strict/1759555455 2025-12-04T08:57:44.2309805Z * [new tag] viable/strict/1759559176 -> viable/strict/1759559176 2025-12-04T08:57:44.2310744Z * [new tag] viable/strict/1759560629 -> viable/strict/1759560629 2025-12-04T08:57:44.2311534Z * [new tag] viable/strict/1759569848 -> viable/strict/1759569848 2025-12-04T08:57:44.2312646Z * [new tag] viable/strict/1759571382 -> viable/strict/1759571382 2025-12-04T08:57:44.2313461Z * [new tag] viable/strict/1759573474 -> viable/strict/1759573474 2025-12-04T08:57:44.2314408Z * [new tag] viable/strict/1759618187 -> viable/strict/1759618187 2025-12-04T08:57:44.2315344Z * [new tag] viable/strict/1759626742 -> viable/strict/1759626742 2025-12-04T08:57:44.2316168Z * [new tag] viable/strict/1759632427 -> viable/strict/1759632427 2025-12-04T08:57:44.2317073Z * [new tag] viable/strict/1759634971 -> viable/strict/1759634971 2025-12-04T08:57:44.2318010Z * [new tag] viable/strict/1759661382 -> viable/strict/1759661382 2025-12-04T08:57:44.2318966Z * [new tag] viable/strict/1759663294 -> viable/strict/1759663294 2025-12-04T08:57:44.2319657Z * [new tag] viable/strict/1759708178 -> viable/strict/1759708178 2025-12-04T08:57:44.2320607Z * [new tag] viable/strict/1759715695 -> viable/strict/1759715695 2025-12-04T08:57:44.2322217Z * [new tag] viable/strict/1759728293 -> viable/strict/1759728293 2025-12-04T08:57:44.2322957Z * [new tag] viable/strict/1759735513 -> viable/strict/1759735513 2025-12-04T08:57:44.2324016Z * [new tag] viable/strict/1759739177 -> viable/strict/1759739177 2025-12-04T08:57:44.2324949Z * [new tag] viable/strict/1759758635 -> viable/strict/1759758635 2025-12-04T08:57:44.2325887Z * [new tag] viable/strict/1759765784 -> viable/strict/1759765784 2025-12-04T08:57:44.2326742Z * [new tag] viable/strict/1759767948 -> viable/strict/1759767948 2025-12-04T08:57:44.2327736Z * [new tag] viable/strict/1759771461 -> viable/strict/1759771461 2025-12-04T08:57:44.2328465Z * [new tag] viable/strict/1759776706 -> viable/strict/1759776706 2025-12-04T08:57:44.2329556Z * [new tag] viable/strict/1759782317 -> viable/strict/1759782317 2025-12-04T08:57:44.2330607Z * [new tag] viable/strict/1759783777 -> viable/strict/1759783777 2025-12-04T08:57:44.2331582Z * [new tag] viable/strict/1759785815 -> viable/strict/1759785815 2025-12-04T08:57:44.2332431Z * [new tag] viable/strict/1759789459 -> viable/strict/1759789459 2025-12-04T08:57:44.2333483Z * [new tag] viable/strict/1759790974 -> viable/strict/1759790974 2025-12-04T08:57:44.2334683Z * [new tag] viable/strict/1759794583 -> viable/strict/1759794583 2025-12-04T08:57:44.2335620Z * [new tag] viable/strict/1759797408 -> viable/strict/1759797408 2025-12-04T08:57:44.2336602Z * [new tag] viable/strict/1759799518 -> viable/strict/1759799518 2025-12-04T08:57:44.2337761Z * [new tag] viable/strict/1759804909 -> viable/strict/1759804909 2025-12-04T08:57:44.2338695Z * [new tag] viable/strict/1759807643 -> viable/strict/1759807643 2025-12-04T08:57:44.2339626Z * [new tag] viable/strict/1759809089 -> viable/strict/1759809089 2025-12-04T08:57:44.2340572Z * [new tag] viable/strict/1759811145 -> viable/strict/1759811145 2025-12-04T08:57:44.2341514Z * [new tag] viable/strict/1759812581 -> viable/strict/1759812581 2025-12-04T08:57:44.2342349Z * [new tag] viable/strict/1759814683 -> viable/strict/1759814683 2025-12-04T08:57:44.2343330Z * [new tag] viable/strict/1759821889 -> viable/strict/1759821889 2025-12-04T08:57:44.2344281Z * [new tag] viable/strict/1759823376 -> viable/strict/1759823376 2025-12-04T08:57:44.2345252Z * [new tag] viable/strict/1759827107 -> viable/strict/1759827107 2025-12-04T08:57:44.2346052Z * [new tag] viable/strict/1759830577 -> viable/strict/1759830577 2025-12-04T08:57:44.2347113Z * [new tag] viable/strict/1759832720 -> viable/strict/1759832720 2025-12-04T08:57:44.2347955Z * [new tag] viable/strict/1759842063 -> viable/strict/1759842063 2025-12-04T08:57:44.2349034Z * [new tag] viable/strict/1759847121 -> viable/strict/1759847121 2025-12-04T08:57:44.2350233Z * [new tag] viable/strict/1759850721 -> viable/strict/1759850721 2025-12-04T08:57:44.2351045Z * [new tag] viable/strict/1759857870 -> viable/strict/1759857870 2025-12-04T08:57:44.2352058Z * [new tag] viable/strict/1759863143 -> viable/strict/1759863143 2025-12-04T08:57:44.2353025Z * [new tag] viable/strict/1759875874 -> viable/strict/1759875874 2025-12-04T08:57:44.2353701Z * [new tag] viable/strict/1759877385 -> viable/strict/1759877385 2025-12-04T08:57:44.2354644Z * [new tag] viable/strict/1759883801 -> viable/strict/1759883801 2025-12-04T08:57:44.2355463Z * [new tag] viable/strict/1759885922 -> viable/strict/1759885922 2025-12-04T08:57:44.2356480Z * [new tag] viable/strict/1759888488 -> viable/strict/1759888488 2025-12-04T08:57:44.2357210Z * [new tag] viable/strict/1759895471 -> viable/strict/1759895471 2025-12-04T08:57:44.2358092Z * [new tag] viable/strict/1759904803 -> viable/strict/1759904803 2025-12-04T08:57:44.2359243Z * [new tag] viable/strict/1759908300 -> viable/strict/1759908300 2025-12-04T08:57:44.2360299Z * [new tag] viable/strict/1759915520 -> viable/strict/1759915520 2025-12-04T08:57:44.2361086Z * [new tag] viable/strict/1759916978 -> viable/strict/1759916978 2025-12-04T08:57:44.2361828Z * [new tag] viable/strict/1759930024 -> viable/strict/1759930024 2025-12-04T08:57:44.2362816Z * [new tag] viable/strict/1759948122 -> viable/strict/1759948122 2025-12-04T08:57:44.2363723Z * [new tag] viable/strict/1759952983 -> viable/strict/1759952983 2025-12-04T08:57:44.2364753Z * [new tag] viable/strict/1759955121 -> viable/strict/1759955121 2025-12-04T08:57:44.2365526Z * [new tag] viable/strict/1759962298 -> viable/strict/1759962298 2025-12-04T08:57:44.2366472Z * [new tag] viable/strict/1759965837 -> viable/strict/1759965837 2025-12-04T08:57:44.2367291Z * [new tag] viable/strict/1759970213 -> viable/strict/1759970213 2025-12-04T08:57:44.2368260Z * [new tag] viable/strict/1759974894 -> viable/strict/1759974894 2025-12-04T08:57:44.2369069Z * [new tag] viable/strict/1759977763 -> viable/strict/1759977763 2025-12-04T08:57:44.2370175Z * [new tag] viable/strict/1759979241 -> viable/strict/1759979241 2025-12-04T08:57:44.2371073Z * [new tag] viable/strict/1759985417 -> viable/strict/1759985417 2025-12-04T08:57:44.2371900Z * [new tag] viable/strict/1759987490 -> viable/strict/1759987490 2025-12-04T08:57:44.2372851Z * [new tag] viable/strict/1759996180 -> viable/strict/1759996180 2025-12-04T08:57:44.2373730Z * [new tag] viable/strict/1760065682 -> viable/strict/1760065682 2025-12-04T08:57:44.2374652Z * [new tag] viable/strict/1760066894 -> viable/strict/1760066894 2025-12-04T08:57:44.2375572Z * [new tag] viable/strict/1760070345 -> viable/strict/1760070345 2025-12-04T08:57:44.2376454Z * [new tag] viable/strict/1760089782 -> viable/strict/1760089782 2025-12-04T08:57:44.2377728Z * [new tag] viable/strict/1760091921 -> viable/strict/1760091921 2025-12-04T08:57:44.2378689Z * [new tag] viable/strict/1760127924 -> viable/strict/1760127924 2025-12-04T08:57:44.2379619Z * [new tag] viable/strict/1760129489 -> viable/strict/1760129489 2025-12-04T08:57:44.2380611Z * [new tag] viable/strict/1760132980 -> viable/strict/1760132980 2025-12-04T08:57:44.2381602Z * [new tag] viable/strict/1760135060 -> viable/strict/1760135060 2025-12-04T08:57:44.2382623Z * [new tag] viable/strict/1760215782 -> viable/strict/1760215782 2025-12-04T08:57:44.2383586Z * [new tag] viable/strict/1760273849 -> viable/strict/1760273849 2025-12-04T08:57:44.2384625Z * [new tag] viable/strict/1760275517 -> viable/strict/1760275517 2025-12-04T08:57:44.2385375Z * [new tag] viable/strict/1760276979 -> viable/strict/1760276979 2025-12-04T08:57:44.2386331Z * [new tag] viable/strict/1760279007 -> viable/strict/1760279007 2025-12-04T08:57:44.2387046Z * [new tag] viable/strict/1760286328 -> viable/strict/1760286328 2025-12-04T08:57:44.2387812Z * [new tag] viable/strict/1760493304 -> viable/strict/1760493304 2025-12-04T08:57:44.2388911Z * [new tag] viable/strict/1760496298 -> viable/strict/1760496298 2025-12-04T08:57:44.2389938Z * [new tag] viable/strict/1760518396 -> viable/strict/1760518396 2025-12-04T08:57:44.2390636Z * [new tag] viable/strict/1760534864 -> viable/strict/1760534864 2025-12-04T08:57:44.2391612Z * [new tag] viable/strict/1760549062 -> viable/strict/1760549062 2025-12-04T08:57:44.2392606Z * [new tag] viable/strict/1760552799 -> viable/strict/1760552799 2025-12-04T08:57:44.2393535Z * [new tag] viable/strict/1760554355 -> viable/strict/1760554355 2025-12-04T08:57:44.2394855Z * [new tag] viable/strict/1760556275 -> viable/strict/1760556275 2025-12-04T08:57:44.2395741Z * [new tag] viable/strict/1760564979 -> viable/strict/1760564979 2025-12-04T08:57:44.2396744Z * [new tag] viable/strict/1760567049 -> viable/strict/1760567049 2025-12-04T08:57:44.2398047Z * [new tag] viable/strict/1760568585 -> viable/strict/1760568585 2025-12-04T08:57:44.2398957Z * [new tag] viable/strict/1760570630 -> viable/strict/1760570630 2025-12-04T08:57:44.2399927Z * [new tag] viable/strict/1760572180 -> viable/strict/1760572180 2025-12-04T08:57:44.2400717Z * [new tag] viable/strict/1760575094 -> viable/strict/1760575094 2025-12-04T08:57:44.2401804Z * [new tag] viable/strict/1760579709 -> viable/strict/1760579709 2025-12-04T08:57:44.2403132Z * [new tag] viable/strict/1760582614 -> viable/strict/1760582614 2025-12-04T08:57:44.2404073Z * [new tag] viable/strict/1760586815 -> viable/strict/1760586815 2025-12-04T08:57:44.2404769Z * [new tag] viable/strict/1760588829 -> viable/strict/1760588829 2025-12-04T08:57:44.2405714Z * [new tag] viable/strict/1760590200 -> viable/strict/1760590200 2025-12-04T08:57:44.2406737Z * [new tag] viable/strict/1760592311 -> viable/strict/1760592311 2025-12-04T08:57:44.2407560Z * [new tag] viable/strict/1760619733 -> viable/strict/1760619733 2025-12-04T08:57:44.2408318Z * [new tag] viable/strict/1760628335 -> viable/strict/1760628335 2025-12-04T08:57:44.2409221Z * [new tag] viable/strict/1760635490 -> viable/strict/1760635490 2025-12-04T08:57:44.2410034Z * [new tag] viable/strict/1760640743 -> viable/strict/1760640743 2025-12-04T08:57:44.2410974Z * [new tag] viable/strict/1760642528 -> viable/strict/1760642528 2025-12-04T08:57:44.2411792Z * [new tag] viable/strict/1760646330 -> viable/strict/1760646330 2025-12-04T08:57:44.2412819Z * [new tag] viable/strict/1760666101 -> viable/strict/1760666101 2025-12-04T08:57:44.2413762Z * [new tag] viable/strict/1760668990 -> viable/strict/1760668990 2025-12-04T08:57:44.2414584Z * [new tag] viable/strict/1760670600 -> viable/strict/1760670600 2025-12-04T08:57:44.2415585Z * [new tag] viable/strict/1760671704 -> viable/strict/1760671704 2025-12-04T08:57:44.2416598Z * [new tag] viable/strict/1760673121 -> viable/strict/1760673121 2025-12-04T08:57:44.2417789Z * [new tag] viable/strict/1760675352 -> viable/strict/1760675352 2025-12-04T08:57:44.2418722Z * [new tag] viable/strict/1760696731 -> viable/strict/1760696731 2025-12-04T08:57:44.2421537Z * [new tag] viable/strict/1760723515 -> viable/strict/1760723515 2025-12-04T08:57:44.2422460Z * [new tag] viable/strict/1760727234 -> viable/strict/1760727234 2025-12-04T08:57:44.2423472Z * [new tag] viable/strict/1760730578 -> viable/strict/1760730578 2025-12-04T08:57:44.2424395Z * [new tag] viable/strict/1760732726 -> viable/strict/1760732726 2025-12-04T08:57:44.2425325Z * [new tag] viable/strict/1760734180 -> viable/strict/1760734180 2025-12-04T08:57:44.2426466Z * [new tag] viable/strict/1760736251 -> viable/strict/1760736251 2025-12-04T08:57:44.2427289Z * [new tag] viable/strict/1760737772 -> viable/strict/1760737772 2025-12-04T08:57:44.2428244Z * [new tag] viable/strict/1760758005 -> viable/strict/1760758005 2025-12-04T08:57:44.2429200Z * [new tag] viable/strict/1760761532 -> viable/strict/1760761532 2025-12-04T08:57:44.2430151Z * [new tag] viable/strict/1760802581 -> viable/strict/1760802581 2025-12-04T08:57:44.2430989Z * [new tag] viable/strict/1760827772 -> viable/strict/1760827772 2025-12-04T08:57:44.2431950Z * [new tag] viable/strict/1760834524 -> viable/strict/1760834524 2025-12-04T08:57:44.2433089Z * [new tag] viable/strict/1760845009 -> viable/strict/1760845009 2025-12-04T08:57:44.2434005Z * [new tag] viable/strict/1760876836 -> viable/strict/1760876836 2025-12-04T08:57:44.2434975Z * [new tag] viable/strict/1760880329 -> viable/strict/1760880329 2025-12-04T08:57:44.2435708Z * [new tag] viable/strict/1760888987 -> viable/strict/1760888987 2025-12-04T08:57:44.2436643Z * [new tag] viable/strict/1760912664 -> viable/strict/1760912664 2025-12-04T08:57:44.2437472Z * [new tag] viable/strict/1760925321 -> viable/strict/1760925321 2025-12-04T08:57:44.2438379Z * [new tag] viable/strict/1760931488 -> viable/strict/1760931488 2025-12-04T08:57:44.2439288Z * [new tag] viable/strict/1760932693 -> viable/strict/1760932693 2025-12-04T08:57:44.2440283Z * [new tag] viable/strict/1761004184 -> viable/strict/1761004184 2025-12-04T08:57:44.2441217Z * [new tag] viable/strict/1761014748 -> viable/strict/1761014748 2025-12-04T08:57:44.2442041Z * [new tag] viable/strict/1761017491 -> viable/strict/1761017491 2025-12-04T08:57:44.2443019Z * [new tag] viable/strict/1761018806 -> viable/strict/1761018806 2025-12-04T08:57:44.2444002Z * [new tag] viable/strict/1761020754 -> viable/strict/1761020754 2025-12-04T08:57:44.2444978Z * [new tag] viable/strict/1761024303 -> viable/strict/1761024303 2025-12-04T08:57:44.2445801Z * [new tag] viable/strict/1761029582 -> viable/strict/1761029582 2025-12-04T08:57:44.2446733Z * [new tag] viable/strict/1761031535 -> viable/strict/1761031535 2025-12-04T08:57:44.2447553Z * [new tag] viable/strict/1761035196 -> viable/strict/1761035196 2025-12-04T08:57:44.2448604Z * [new tag] viable/strict/1761045825 -> viable/strict/1761045825 2025-12-04T08:57:44.2449543Z * [new tag] viable/strict/1761054796 -> viable/strict/1761054796 2025-12-04T08:57:44.2450451Z * [new tag] viable/strict/1761060314 -> viable/strict/1761060314 2025-12-04T08:57:44.2451364Z * [new tag] viable/strict/1761071198 -> viable/strict/1761071198 2025-12-04T08:57:44.2452339Z * [new tag] viable/strict/1761074628 -> viable/strict/1761074628 2025-12-04T08:57:44.2453252Z * [new tag] viable/strict/1761078351 -> viable/strict/1761078351 2025-12-04T08:57:44.2454173Z * [new tag] viable/strict/1761079822 -> viable/strict/1761079822 2025-12-04T08:57:44.2454957Z * [new tag] viable/strict/1761081873 -> viable/strict/1761081873 2025-12-04T08:57:44.2455938Z * [new tag] viable/strict/1761083392 -> viable/strict/1761083392 2025-12-04T08:57:44.2457663Z * [new tag] viable/strict/1761085465 -> viable/strict/1761085465 2025-12-04T08:57:44.2458624Z * [new tag] viable/strict/1761089099 -> viable/strict/1761089099 2025-12-04T08:57:44.2459448Z * [new tag] viable/strict/1761095535 -> viable/strict/1761095535 2025-12-04T08:57:44.2460487Z * [new tag] viable/strict/1761098119 -> viable/strict/1761098119 2025-12-04T08:57:44.2461927Z * [new tag] viable/strict/1761101330 -> viable/strict/1761101330 2025-12-04T08:57:44.2462879Z * [new tag] viable/strict/1761114425 -> viable/strict/1761114425 2025-12-04T08:57:44.2463793Z * [new tag] viable/strict/1761116036 -> viable/strict/1761116036 2025-12-04T08:57:44.2464756Z * [new tag] viable/strict/1761119379 -> viable/strict/1761119379 2025-12-04T08:57:44.2465692Z * [new tag] viable/strict/1761121601 -> viable/strict/1761121601 2025-12-04T08:57:44.2466539Z * [new tag] viable/strict/1761123234 -> viable/strict/1761123234 2025-12-04T08:57:44.2467466Z * [new tag] viable/strict/1761126621 -> viable/strict/1761126621 2025-12-04T08:57:44.2468445Z * [new tag] viable/strict/1761132259 -> viable/strict/1761132259 2025-12-04T08:57:44.2469385Z * [new tag] viable/strict/1761146746 -> viable/strict/1761146746 2025-12-04T08:57:44.2470320Z * [new tag] viable/strict/1761164752 -> viable/strict/1761164752 2025-12-04T08:57:44.2471242Z * [new tag] viable/strict/1761166198 -> viable/strict/1761166198 2025-12-04T08:57:44.2472147Z * [new tag] viable/strict/1761175424 -> viable/strict/1761175424 2025-12-04T08:57:44.2473072Z * [new tag] viable/strict/1761176983 -> viable/strict/1761176983 2025-12-04T08:57:44.2474179Z * [new tag] viable/strict/1761179891 -> viable/strict/1761179891 2025-12-04T08:57:44.2475070Z * [new tag] viable/strict/1761181930 -> viable/strict/1761181930 2025-12-04T08:57:44.2476076Z * [new tag] viable/strict/1761184516 -> viable/strict/1761184516 2025-12-04T08:57:44.2477039Z * [new tag] viable/strict/1761190179 -> viable/strict/1761190179 2025-12-04T08:57:44.2477869Z * [new tag] viable/strict/1761193558 -> viable/strict/1761193558 2025-12-04T08:57:44.2478816Z * [new tag] viable/strict/1761207990 -> viable/strict/1761207990 2025-12-04T08:57:44.2479756Z * [new tag] viable/strict/1761229539 -> viable/strict/1761229539 2025-12-04T08:57:44.2480922Z * [new tag] viable/strict/1761244031 -> viable/strict/1761244031 2025-12-04T08:57:44.2481844Z * [new tag] viable/strict/1761248986 -> viable/strict/1761248986 2025-12-04T08:57:44.2482735Z * [new tag] viable/strict/1761259791 -> viable/strict/1761259791 2025-12-04T08:57:44.2483566Z * [new tag] viable/strict/1761266139 -> viable/strict/1761266139 2025-12-04T08:57:44.2484606Z * [new tag] viable/strict/1761268316 -> viable/strict/1761268316 2025-12-04T08:57:44.2485410Z * [new tag] viable/strict/1761273805 -> viable/strict/1761273805 2025-12-04T08:57:44.2486340Z * [new tag] viable/strict/1761275261 -> viable/strict/1761275261 2025-12-04T08:57:44.2487300Z * [new tag] viable/strict/1761277913 -> viable/strict/1761277913 2025-12-04T08:57:44.2488276Z * [new tag] viable/strict/1761290701 -> viable/strict/1761290701 2025-12-04T08:57:44.2489228Z * [new tag] viable/strict/1761294396 -> viable/strict/1761294396 2025-12-04T08:57:44.2490132Z * [new tag] viable/strict/1761303047 -> viable/strict/1761303047 2025-12-04T08:57:44.2491051Z * [new tag] viable/strict/1761335388 -> viable/strict/1761335388 2025-12-04T08:57:44.2491950Z * [new tag] viable/strict/1761337551 -> viable/strict/1761337551 2025-12-04T08:57:44.2492855Z * [new tag] viable/strict/1761339007 -> viable/strict/1761339007 2025-12-04T08:57:44.2493662Z * [new tag] viable/strict/1761341050 -> viable/strict/1761341050 2025-12-04T08:57:44.2494696Z * [new tag] viable/strict/1761346188 -> viable/strict/1761346188 2025-12-04T08:57:44.2495782Z * [new tag] viable/strict/1761349792 -> viable/strict/1761349792 2025-12-04T08:57:44.2496930Z * [new tag] viable/strict/1761352620 -> viable/strict/1761352620 2025-12-04T08:57:44.2497899Z * [new tag] viable/strict/1761354730 -> viable/strict/1761354730 2025-12-04T08:57:44.2498869Z * [new tag] viable/strict/1761357298 -> viable/strict/1761357298 2025-12-04T08:57:44.2499787Z * [new tag] viable/strict/1761360201 -> viable/strict/1761360201 2025-12-04T08:57:44.2500783Z * [new tag] viable/strict/1761361753 -> viable/strict/1761361753 2025-12-04T08:57:44.2501734Z * [new tag] viable/strict/1761364351 -> viable/strict/1761364351 2025-12-04T08:57:44.2502570Z * [new tag] viable/strict/1761366338 -> viable/strict/1761366338 2025-12-04T08:57:44.2503703Z * [new tag] viable/strict/1761367802 -> viable/strict/1761367802 2025-12-04T08:57:44.2504657Z * [new tag] viable/strict/1761369889 -> viable/strict/1761369889 2025-12-04T08:57:44.2505628Z * [new tag] viable/strict/1761371385 -> viable/strict/1761371385 2025-12-04T08:57:44.2506688Z * [new tag] viable/strict/1761373581 -> viable/strict/1761373581 2025-12-04T08:57:44.2507781Z * [new tag] viable/strict/1761375054 -> viable/strict/1761375054 2025-12-04T08:57:44.2508914Z * [new tag] viable/strict/1761421785 -> viable/strict/1761421785 2025-12-04T08:57:44.2509911Z * [new tag] viable/strict/1761434614 -> viable/strict/1761434614 2025-12-04T08:57:44.2511138Z * [new tag] viable/strict/1761439254 -> viable/strict/1761439254 2025-12-04T08:57:44.2512176Z * [new tag] viable/strict/1761454187 -> viable/strict/1761454187 2025-12-04T08:57:44.2513126Z * [new tag] viable/strict/1761459991 -> viable/strict/1761459991 2025-12-04T08:57:44.2514186Z * [new tag] viable/strict/1761470668 -> viable/strict/1761470668 2025-12-04T08:57:44.2515534Z * [new tag] viable/strict/1761472188 -> viable/strict/1761472188 2025-12-04T08:57:44.2516497Z * [new tag] viable/strict/1761503178 -> viable/strict/1761503178 2025-12-04T08:57:44.2517423Z * [new tag] viable/strict/1761517492 -> viable/strict/1761517492 2025-12-04T08:57:44.2518324Z * [new tag] viable/strict/1761518981 -> viable/strict/1761518981 2025-12-04T08:57:44.2519333Z * [new tag] viable/strict/1761533609 -> viable/strict/1761533609 2025-12-04T08:57:44.2520540Z * [new tag] viable/strict/1761546438 -> viable/strict/1761546438 2025-12-04T08:57:44.2522109Z * [new tag] viable/strict/1761548133 -> viable/strict/1761548133 2025-12-04T08:57:44.2523243Z * [new tag] viable/strict/1761555186 -> viable/strict/1761555186 2025-12-04T08:57:44.2524429Z * [new tag] viable/strict/1761557178 -> viable/strict/1761557178 2025-12-04T08:57:44.2525439Z * [new tag] viable/strict/1761560772 -> viable/strict/1761560772 2025-12-04T08:57:44.2526356Z * [new tag] viable/strict/1761562266 -> viable/strict/1761562266 2025-12-04T08:57:44.2527365Z * [new tag] viable/strict/1761564260 -> viable/strict/1761564260 2025-12-04T08:57:44.2528315Z * [new tag] viable/strict/1761568072 -> viable/strict/1761568072 2025-12-04T08:57:44.2529249Z * [new tag] viable/strict/1761571683 -> viable/strict/1761571683 2025-12-04T08:57:44.2530012Z * [new tag] viable/strict/1761580199 -> viable/strict/1761580199 2025-12-04T08:57:44.2531049Z * [new tag] viable/strict/1761587383 -> viable/strict/1761587383 2025-12-04T08:57:44.2532186Z * [new tag] viable/strict/1761591165 -> viable/strict/1761591165 2025-12-04T08:57:44.2532933Z * [new tag] viable/strict/1761594575 -> viable/strict/1761594575 2025-12-04T08:57:44.2534024Z * [new tag] viable/strict/1761596710 -> viable/strict/1761596710 2025-12-04T08:57:44.2534944Z * [new tag] viable/strict/1761598189 -> viable/strict/1761598189 2025-12-04T08:57:44.2535870Z * [new tag] viable/strict/1761600254 -> viable/strict/1761600254 2025-12-04T08:57:44.2537088Z * [new tag] viable/strict/1761603879 -> viable/strict/1761603879 2025-12-04T08:57:44.2538082Z * [new tag] viable/strict/1761605429 -> viable/strict/1761605429 2025-12-04T08:57:44.2539138Z * [new tag] viable/strict/1761607468 -> viable/strict/1761607468 2025-12-04T08:57:44.2540164Z * [new tag] viable/strict/1761608983 -> viable/strict/1761608983 2025-12-04T08:57:44.2541184Z * [new tag] viable/strict/1761611846 -> viable/strict/1761611846 2025-12-04T08:57:44.2542176Z * [new tag] viable/strict/1761613922 -> viable/strict/1761613922 2025-12-04T08:57:44.2542935Z * [new tag] viable/strict/1761616504 -> viable/strict/1761616504 2025-12-04T08:57:44.2543754Z * [new tag] viable/strict/1761619599 -> viable/strict/1761619599 2025-12-04T08:57:44.2544745Z * [new tag] viable/strict/1761686693 -> viable/strict/1761686693 2025-12-04T08:57:44.2545691Z * [new tag] viable/strict/1761688179 -> viable/strict/1761688179 2025-12-04T08:57:44.2546545Z * [new tag] viable/strict/1761691973 -> viable/strict/1761691973 2025-12-04T08:57:44.2547695Z * [new tag] viable/strict/1761693884 -> viable/strict/1761693884 2025-12-04T08:57:44.2548649Z * [new tag] viable/strict/1761695389 -> viable/strict/1761695389 2025-12-04T08:57:44.2549713Z * [new tag] viable/strict/1761698408 -> viable/strict/1761698408 2025-12-04T08:57:44.2550662Z * [new tag] viable/strict/1761702931 -> viable/strict/1761702931 2025-12-04T08:57:44.2551570Z * [new tag] viable/strict/1761706307 -> viable/strict/1761706307 2025-12-04T08:57:44.2552497Z * [new tag] viable/strict/1761709065 -> viable/strict/1761709065 2025-12-04T08:57:44.2553511Z * [new tag] viable/strict/1761710285 -> viable/strict/1761710285 2025-12-04T08:57:44.2554510Z * [new tag] viable/strict/1761711983 -> viable/strict/1761711983 2025-12-04T08:57:44.2555490Z * [new tag] viable/strict/1761713514 -> viable/strict/1761713514 2025-12-04T08:57:44.2556511Z * [new tag] viable/strict/1761715523 -> viable/strict/1761715523 2025-12-04T08:57:44.2557565Z * [new tag] viable/strict/1761727973 -> viable/strict/1761727973 2025-12-04T08:57:44.2558544Z * [new tag] viable/strict/1761751558 -> viable/strict/1761751558 2025-12-04T08:57:44.2559530Z * [new tag] viable/strict/1761755187 -> viable/strict/1761755187 2025-12-04T08:57:44.2560512Z * [new tag] viable/strict/1761756826 -> viable/strict/1761756826 2025-12-04T08:57:44.2561519Z * [new tag] viable/strict/1761769551 -> viable/strict/1761769551 2025-12-04T08:57:44.2562555Z * [new tag] viable/strict/1761771032 -> viable/strict/1761771032 2025-12-04T08:57:44.2563297Z * [new tag] viable/strict/1761773101 -> viable/strict/1761773101 2025-12-04T08:57:44.2564331Z * [new tag] viable/strict/1761781792 -> viable/strict/1761781792 2025-12-04T08:57:44.2565303Z * [new tag] viable/strict/1761784788 -> viable/strict/1761784788 2025-12-04T08:57:44.2566249Z * [new tag] viable/strict/1761786740 -> viable/strict/1761786740 2025-12-04T08:57:44.2567336Z * [new tag] viable/strict/1761789332 -> viable/strict/1761789332 2025-12-04T08:57:44.2568701Z * [new tag] viable/strict/1761792569 -> viable/strict/1761792569 2025-12-04T08:57:44.2569674Z * [new tag] viable/strict/1761795289 -> viable/strict/1761795289 2025-12-04T08:57:44.2570605Z * [new tag] viable/strict/1761798345 -> viable/strict/1761798345 2025-12-04T08:57:44.2571677Z * [new tag] viable/strict/1761799827 -> viable/strict/1761799827 2025-12-04T08:57:44.2572658Z * [new tag] viable/strict/1761805604 -> viable/strict/1761805604 2025-12-04T08:57:44.2573587Z * [new tag] viable/strict/1761807202 -> viable/strict/1761807202 2025-12-04T08:57:44.2574549Z * [new tag] viable/strict/1761809094 -> viable/strict/1761809094 2025-12-04T08:57:44.2575500Z * [new tag] viable/strict/1761810576 -> viable/strict/1761810576 2025-12-04T08:57:44.2576609Z * [new tag] viable/strict/1761812771 -> viable/strict/1761812771 2025-12-04T08:57:44.2577838Z * [new tag] viable/strict/1761814363 -> viable/strict/1761814363 2025-12-04T08:57:44.2578795Z * [new tag] viable/strict/1761857410 -> viable/strict/1761857410 2025-12-04T08:57:44.2579810Z * [new tag] viable/strict/1761860985 -> viable/strict/1761860985 2025-12-04T08:57:44.2580775Z * [new tag] viable/strict/1761863094 -> viable/strict/1761863094 2025-12-04T08:57:44.2581736Z * [new tag] viable/strict/1761864590 -> viable/strict/1761864590 2025-12-04T08:57:44.2582706Z * [new tag] viable/strict/1761866675 -> viable/strict/1761866675 2025-12-04T08:57:44.2583956Z * [new tag] viable/strict/1761868178 -> viable/strict/1761868178 2025-12-04T08:57:44.2585382Z * [new tag] viable/strict/1761871111 -> viable/strict/1761871111 2025-12-04T08:57:44.2586395Z * [new tag] viable/strict/1761873126 -> viable/strict/1761873126 2025-12-04T08:57:44.2587416Z * [new tag] viable/strict/1761875714 -> viable/strict/1761875714 2025-12-04T08:57:44.2588439Z * [new tag] viable/strict/1761878924 -> viable/strict/1761878924 2025-12-04T08:57:44.2589590Z * [new tag] viable/strict/1761881727 -> viable/strict/1761881727 2025-12-04T08:57:44.2590540Z * [new tag] viable/strict/1761882959 -> viable/strict/1761882959 2025-12-04T08:57:44.2591501Z * [new tag] viable/strict/1761886268 -> viable/strict/1761886268 2025-12-04T08:57:44.2592460Z * [new tag] viable/strict/1761893641 -> viable/strict/1761893641 2025-12-04T08:57:44.2593434Z * [new tag] viable/strict/1761931517 -> viable/strict/1761931517 2025-12-04T08:57:44.2594377Z * [new tag] viable/strict/1761933080 -> viable/strict/1761933080 2025-12-04T08:57:44.2595351Z * [new tag] viable/strict/1761935217 -> viable/strict/1761935217 2025-12-04T08:57:44.2596351Z * [new tag] viable/strict/1761938533 -> viable/strict/1761938533 2025-12-04T08:57:44.2597322Z * [new tag] viable/strict/1761940184 -> viable/strict/1761940184 2025-12-04T08:57:44.2598288Z * [new tag] viable/strict/1761942338 -> viable/strict/1761942338 2025-12-04T08:57:44.2599199Z * [new tag] viable/strict/1761946100 -> viable/strict/1761946100 2025-12-04T08:57:44.2600195Z * [new tag] viable/strict/1761947374 -> viable/strict/1761947374 2025-12-04T08:57:44.2601152Z * [new tag] viable/strict/1761950978 -> viable/strict/1761950978 2025-12-04T08:57:44.2602087Z * [new tag] viable/strict/1761957727 -> viable/strict/1761957727 2025-12-04T08:57:44.2603048Z * [new tag] viable/strict/1761959532 -> viable/strict/1761959532 2025-12-04T08:57:44.2604190Z * [new tag] viable/strict/1761965366 -> viable/strict/1761965366 2025-12-04T08:57:44.2605262Z * [new tag] viable/strict/1761968066 -> viable/strict/1761968066 2025-12-04T08:57:44.2606189Z * [new tag] viable/strict/1761969322 -> viable/strict/1761969322 2025-12-04T08:57:44.2607196Z * [new tag] viable/strict/1761974723 -> viable/strict/1761974723 2025-12-04T08:57:44.2608247Z * [new tag] viable/strict/1761981837 -> viable/strict/1761981837 2025-12-04T08:57:44.2609274Z * [new tag] viable/strict/1761985546 -> viable/strict/1761985546 2025-12-04T08:57:44.2610241Z * [new tag] viable/strict/1761987030 -> viable/strict/1761987030 2025-12-04T08:57:44.2611231Z * [new tag] viable/strict/1762003554 -> viable/strict/1762003554 2025-12-04T08:57:44.2612194Z * [new tag] viable/strict/1762021560 -> viable/strict/1762021560 2025-12-04T08:57:44.2613141Z * [new tag] viable/strict/1762032190 -> viable/strict/1762032190 2025-12-04T08:57:44.2614152Z * [new tag] viable/strict/1762040981 -> viable/strict/1762040981 2025-12-04T08:57:44.2615143Z * [new tag] viable/strict/1762048525 -> viable/strict/1762048525 2025-12-04T08:57:44.2616130Z * [new tag] viable/strict/1762104223 -> viable/strict/1762104223 2025-12-04T08:57:44.2617432Z * [new tag] viable/strict/1762105778 -> viable/strict/1762105778 2025-12-04T08:57:44.2618408Z * [new tag] viable/strict/1762115109 -> viable/strict/1762115109 2025-12-04T08:57:44.2619370Z * [new tag] viable/strict/1762125840 -> viable/strict/1762125840 2025-12-04T08:57:44.2620154Z * [new tag] viable/strict/1762127377 -> viable/strict/1762127377 2025-12-04T08:57:44.2625413Z * [new tag] viable/strict/1762134925 -> viable/strict/1762134925 2025-12-04T08:57:44.2626275Z * [new tag] viable/strict/1762138338 -> viable/strict/1762138338 2025-12-04T08:57:44.2627387Z * [new tag] viable/strict/1762148993 -> viable/strict/1762148993 2025-12-04T08:57:44.2628583Z * [new tag] viable/strict/1762152871 -> viable/strict/1762152871 2025-12-04T08:57:44.2629607Z * [new tag] viable/strict/1762156183 -> viable/strict/1762156183 2025-12-04T08:57:44.2630591Z * [new tag] viable/strict/1762163457 -> viable/strict/1762163457 2025-12-04T08:57:44.2631635Z * [new tag] viable/strict/1762165569 -> viable/strict/1762165569 2025-12-04T08:57:44.2632587Z * [new tag] viable/strict/1762169035 -> viable/strict/1762169035 2025-12-04T08:57:44.2633667Z * [new tag] viable/strict/1762174936 -> viable/strict/1762174936 2025-12-04T08:57:44.2634633Z * [new tag] viable/strict/1762194412 -> viable/strict/1762194412 2025-12-04T08:57:44.2635602Z * [new tag] viable/strict/1762195876 -> viable/strict/1762195876 2025-12-04T08:57:44.2636521Z * [new tag] viable/strict/1762197788 -> viable/strict/1762197788 2025-12-04T08:57:44.2637545Z * [new tag] viable/strict/1762199389 -> viable/strict/1762199389 2025-12-04T08:57:44.2638718Z * [new tag] viable/strict/1762206585 -> viable/strict/1762206585 2025-12-04T08:57:44.2639788Z * [new tag] viable/strict/1762210184 -> viable/strict/1762210184 2025-12-04T08:57:44.2640735Z * [new tag] viable/strict/1762218736 -> viable/strict/1762218736 2025-12-04T08:57:44.2641714Z * [new tag] viable/strict/1762224529 -> viable/strict/1762224529 2025-12-04T08:57:44.2642748Z * [new tag] viable/strict/1762227253 -> viable/strict/1762227253 2025-12-04T08:57:44.2643501Z * [new tag] viable/strict/1762228515 -> viable/strict/1762228515 2025-12-04T08:57:44.2644793Z * [new tag] viable/strict/1762230349 -> viable/strict/1762230349 2025-12-04T08:57:44.2645554Z * [new tag] viable/strict/1762231859 -> viable/strict/1762231859 2025-12-04T08:57:44.2646607Z * [new tag] viable/strict/1762233925 -> viable/strict/1762233925 2025-12-04T08:57:44.2647697Z * [new tag] viable/strict/1762237630 -> viable/strict/1762237630 2025-12-04T08:57:44.2648449Z * [new tag] viable/strict/1762253522 -> viable/strict/1762253522 2025-12-04T08:57:44.2649657Z * [new tag] viable/strict/1762278588 -> viable/strict/1762278588 2025-12-04T08:57:44.2650616Z * [new tag] viable/strict/1762284203 -> viable/strict/1762284203 2025-12-04T08:57:44.2651604Z * [new tag] viable/strict/1762289446 -> viable/strict/1762289446 2025-12-04T08:57:44.2652583Z * [new tag] viable/strict/1762291515 -> viable/strict/1762291515 2025-12-04T08:57:44.2653979Z * [new tag] viable/strict/1762295100 -> viable/strict/1762295100 2025-12-04T08:57:44.2654748Z * [new tag] viable/strict/1762296590 -> viable/strict/1762296590 2025-12-04T08:57:44.2655570Z * [new tag] viable/strict/1762300179 -> viable/strict/1762300179 2025-12-04T08:57:44.2656564Z * [new tag] viable/strict/1762303207 -> viable/strict/1762303207 2025-12-04T08:57:44.2657863Z * [new tag] viable/strict/1762386584 -> viable/strict/1762386584 2025-12-04T08:57:44.2658825Z * [new tag] viable/strict/1762391537 -> viable/strict/1762391537 2025-12-04T08:57:44.2659620Z * [new tag] viable/strict/1762394119 -> viable/strict/1762394119 2025-12-04T08:57:44.2661109Z * [new tag] viable/strict/1762397437 -> viable/strict/1762397437 2025-12-04T08:57:44.2662122Z * [new tag] viable/strict/1762400256 -> viable/strict/1762400256 2025-12-04T08:57:44.2663105Z * [new tag] viable/strict/1762401469 -> viable/strict/1762401469 2025-12-04T08:57:44.2664114Z * [new tag] viable/strict/1762408195 -> viable/strict/1762408195 2025-12-04T08:57:44.2665196Z * [new tag] viable/strict/1762410411 -> viable/strict/1762410411 2025-12-04T08:57:44.2666218Z * [new tag] viable/strict/1762417613 -> viable/strict/1762417613 2025-12-04T08:57:44.2667220Z * [new tag] viable/strict/1762419198 -> viable/strict/1762419198 2025-12-04T08:57:44.2668221Z * [new tag] viable/strict/1762422656 -> viable/strict/1762422656 2025-12-04T08:57:44.2669726Z * [new tag] viable/strict/1762424746 -> viable/strict/1762424746 2025-12-04T08:57:44.2670730Z * [new tag] viable/strict/1762446386 -> viable/strict/1762446386 2025-12-04T08:57:44.2671707Z * [new tag] viable/strict/1762449912 -> viable/strict/1762449912 2025-12-04T08:57:44.2672698Z * [new tag] viable/strict/1762457031 -> viable/strict/1762457031 2025-12-04T08:57:44.2673756Z * [new tag] viable/strict/1762462441 -> viable/strict/1762462441 2025-12-04T08:57:44.2674735Z * [new tag] viable/strict/1762467909 -> viable/strict/1762467909 2025-12-04T08:57:44.2675733Z * [new tag] viable/strict/1762471493 -> viable/strict/1762471493 2025-12-04T08:57:44.2676783Z * [new tag] viable/strict/1762475990 -> viable/strict/1762475990 2025-12-04T08:57:44.2677823Z * [new tag] viable/strict/1762477933 -> viable/strict/1762477933 2025-12-04T08:57:44.2678788Z * [new tag] viable/strict/1762491053 -> viable/strict/1762491053 2025-12-04T08:57:44.2679769Z * [new tag] viable/strict/1762493118 -> viable/strict/1762493118 2025-12-04T08:57:44.2680690Z * [new tag] viable/strict/1762498442 -> viable/strict/1762498442 2025-12-04T08:57:44.2681733Z * [new tag] viable/strict/1762501778 -> viable/strict/1762501778 2025-12-04T08:57:44.2682687Z * [new tag] viable/strict/1762504001 -> viable/strict/1762504001 2025-12-04T08:57:44.2683776Z * [new tag] viable/strict/1762505583 -> viable/strict/1762505583 2025-12-04T08:57:44.2684823Z * [new tag] viable/strict/1762507523 -> viable/strict/1762507523 2025-12-04T08:57:44.2685841Z * [new tag] viable/strict/1762511140 -> viable/strict/1762511140 2025-12-04T08:57:44.2686961Z * [new tag] viable/strict/1762512632 -> viable/strict/1762512632 2025-12-04T08:57:44.2687970Z * [new tag] viable/strict/1762520467 -> viable/strict/1762520467 2025-12-04T08:57:44.2688935Z * [new tag] viable/strict/1762522016 -> viable/strict/1762522016 2025-12-04T08:57:44.2689887Z * [new tag] viable/strict/1762530591 -> viable/strict/1762530591 2025-12-04T08:57:44.2690843Z * [new tag] viable/strict/1762543405 -> viable/strict/1762543405 2025-12-04T08:57:44.2691614Z * [new tag] viable/strict/1762544998 -> viable/strict/1762544998 2025-12-04T08:57:44.2693272Z * [new tag] viable/strict/1762552182 -> viable/strict/1762552182 2025-12-04T08:57:44.2693680Z * [new tag] viable/strict/1762554297 -> viable/strict/1762554297 2025-12-04T08:57:44.2695255Z * [new tag] viable/strict/1762559381 -> viable/strict/1762559381 2025-12-04T08:57:44.2695475Z * [new tag] viable/strict/1762562222 -> viable/strict/1762562222 2025-12-04T08:57:44.2696494Z * [new tag] viable/strict/1762564319 -> viable/strict/1762564319 2025-12-04T08:57:44.2697612Z * [new tag] viable/strict/1762566904 -> viable/strict/1762566904 2025-12-04T08:57:44.2698604Z * [new tag] viable/strict/1762569781 -> viable/strict/1762569781 2025-12-04T08:57:44.2699610Z * [new tag] viable/strict/1762575940 -> viable/strict/1762575940 2025-12-04T08:57:44.2700591Z * [new tag] viable/strict/1762580974 -> viable/strict/1762580974 2025-12-04T08:57:44.2701582Z * [new tag] viable/strict/1762583185 -> viable/strict/1762583185 2025-12-04T08:57:44.2702587Z * [new tag] viable/strict/1762586647 -> viable/strict/1762586647 2025-12-04T08:57:44.2703645Z * [new tag] viable/strict/1762588183 -> viable/strict/1762588183 2025-12-04T08:57:44.2704653Z * [new tag] viable/strict/1762593886 -> viable/strict/1762593886 2025-12-04T08:57:44.2705740Z * [new tag] viable/strict/1762650743 -> viable/strict/1762650743 2025-12-04T08:57:44.2706824Z * [new tag] viable/strict/1762653328 -> viable/strict/1762653328 2025-12-04T08:57:44.2707846Z * [new tag] viable/strict/1762659342 -> viable/strict/1762659342 2025-12-04T08:57:44.2708935Z * [new tag] viable/strict/1762662360 -> viable/strict/1762662360 2025-12-04T08:57:44.2709910Z * [new tag] viable/strict/1762667377 -> viable/strict/1762667377 2025-12-04T08:57:44.2710868Z * [new tag] viable/strict/1762671090 -> viable/strict/1762671090 2025-12-04T08:57:44.2711856Z * [new tag] viable/strict/1762680284 -> viable/strict/1762680284 2025-12-04T08:57:44.2712829Z * [new tag] viable/strict/1762683900 -> viable/strict/1762683900 2025-12-04T08:57:44.2713801Z * [new tag] viable/strict/1762705541 -> viable/strict/1762705541 2025-12-04T08:57:44.2714760Z * [new tag] viable/strict/1762709004 -> viable/strict/1762709004 2025-12-04T08:57:44.2715785Z * [new tag] viable/strict/1762746004 -> viable/strict/1762746004 2025-12-04T08:57:44.2716848Z * [new tag] viable/strict/1762748799 -> viable/strict/1762748799 2025-12-04T08:57:44.2717908Z * [new tag] viable/strict/1762759504 -> viable/strict/1762759504 2025-12-04T08:57:44.2719354Z * [new tag] viable/strict/1762760973 -> viable/strict/1762760973 2025-12-04T08:57:44.2720354Z * [new tag] viable/strict/1762775374 -> viable/strict/1762775374 2025-12-04T08:57:44.2721756Z * [new tag] viable/strict/1762777661 -> viable/strict/1762777661 2025-12-04T08:57:44.2722777Z * [new tag] viable/strict/1762779774 -> viable/strict/1762779774 2025-12-04T08:57:44.2723973Z * [new tag] viable/strict/1762781259 -> viable/strict/1762781259 2025-12-04T08:57:44.2725113Z * [new tag] viable/strict/1762793628 -> viable/strict/1762793628 2025-12-04T08:57:44.2726201Z * [new tag] viable/strict/1762800711 -> viable/strict/1762800711 2025-12-04T08:57:44.2727197Z * [new tag] viable/strict/1762809894 -> viable/strict/1762809894 2025-12-04T08:57:44.2728180Z * [new tag] viable/strict/1762811384 -> viable/strict/1762811384 2025-12-04T08:57:44.2729250Z * [new tag] viable/strict/1762813841 -> viable/strict/1762813841 2025-12-04T08:57:44.2730271Z * [new tag] viable/strict/1762815047 -> viable/strict/1762815047 2025-12-04T08:57:44.2731452Z * [new tag] viable/strict/1762817094 -> viable/strict/1762817094 2025-12-04T08:57:44.2732656Z * [new tag] viable/strict/1762818582 -> viable/strict/1762818582 2025-12-04T08:57:44.2733784Z * [new tag] viable/strict/1762821623 -> viable/strict/1762821623 2025-12-04T08:57:44.2734550Z * [new tag] viable/strict/1762823531 -> viable/strict/1762823531 2025-12-04T08:57:44.2735647Z * [new tag] viable/strict/1762849583 -> viable/strict/1762849583 2025-12-04T08:57:44.2736661Z * [new tag] viable/strict/1762851200 -> viable/strict/1762851200 2025-12-04T08:57:44.2737916Z * [new tag] viable/strict/1762854603 -> viable/strict/1762854603 2025-12-04T08:57:44.2738975Z * [new tag] viable/strict/1762858276 -> viable/strict/1762858276 2025-12-04T08:57:44.2740166Z * [new tag] viable/strict/1762860891 -> viable/strict/1762860891 2025-12-04T08:57:44.2741769Z * [new tag] viable/strict/1762866174 -> viable/strict/1762866174 2025-12-04T08:57:44.2742779Z * [new tag] viable/strict/1762867653 -> viable/strict/1762867653 2025-12-04T08:57:44.2743788Z * [new tag] viable/strict/1762872669 -> viable/strict/1762872669 2025-12-04T08:57:44.2744587Z * [new tag] viable/strict/1762878380 -> viable/strict/1762878380 2025-12-04T08:57:44.2745712Z * [new tag] viable/strict/1762889003 -> viable/strict/1762889003 2025-12-04T08:57:44.2746766Z * [new tag] viable/strict/1762890589 -> viable/strict/1762890589 2025-12-04T08:57:44.2747779Z * [new tag] viable/strict/1762892743 -> viable/strict/1762892743 2025-12-04T08:57:44.2748899Z * [new tag] viable/strict/1762894271 -> viable/strict/1762894271 2025-12-04T08:57:44.2749674Z * [new tag] viable/strict/1762896287 -> viable/strict/1762896287 2025-12-04T08:57:44.2750696Z * [new tag] viable/strict/1762915871 -> viable/strict/1762915871 2025-12-04T08:57:44.2751760Z * [new tag] viable/strict/1762918569 -> viable/strict/1762918569 2025-12-04T08:57:44.2752524Z * [new tag] viable/strict/1762919776 -> viable/strict/1762919776 2025-12-04T08:57:44.2753574Z * [new tag] viable/strict/1762923072 -> viable/strict/1762923072 2025-12-04T08:57:44.2754543Z * [new tag] viable/strict/1762928826 -> viable/strict/1762928826 2025-12-04T08:57:44.2755642Z * [new tag] viable/strict/1762930451 -> viable/strict/1762930451 2025-12-04T08:57:44.2756748Z * [new tag] viable/strict/1762933780 -> viable/strict/1762933780 2025-12-04T08:57:44.2757588Z * [new tag] viable/strict/1762937638 -> viable/strict/1762937638 2025-12-04T08:57:44.2758790Z * [new tag] viable/strict/1762939545 -> viable/strict/1762939545 2025-12-04T08:57:44.2759796Z * [new tag] viable/strict/1762962692 -> viable/strict/1762962692 2025-12-04T08:57:44.2760764Z * [new tag] viable/strict/1762979143 -> viable/strict/1762979143 2025-12-04T08:57:44.2761754Z * [new tag] viable/strict/1762984188 -> viable/strict/1762984188 2025-12-04T08:57:44.2762508Z * [new tag] viable/strict/1762986306 -> viable/strict/1762986306 2025-12-04T08:57:44.2763575Z * [new tag] viable/strict/1762989903 -> viable/strict/1762989903 2025-12-04T08:57:44.2764559Z * [new tag] viable/strict/1762991377 -> viable/strict/1762991377 2025-12-04T08:57:44.2765608Z * [new tag] viable/strict/1762998921 -> viable/strict/1762998921 2025-12-04T08:57:44.2766675Z * [new tag] viable/strict/1763002287 -> viable/strict/1763002287 2025-12-04T08:57:44.2767680Z * [new tag] viable/strict/1763016840 -> viable/strict/1763016840 2025-12-04T08:57:44.2768649Z * [new tag] viable/strict/1763020180 -> viable/strict/1763020180 2025-12-04T08:57:44.2769781Z * [new tag] viable/strict/1763027421 -> viable/strict/1763027421 2025-12-04T08:57:44.2770709Z * [new tag] viable/strict/1763031120 -> viable/strict/1763031120 2025-12-04T08:57:44.2771710Z * [new tag] viable/strict/1763036861 -> viable/strict/1763036861 2025-12-04T08:57:44.2772797Z * [new tag] viable/strict/1763038993 -> viable/strict/1763038993 2025-12-04T08:57:44.2773905Z * [new tag] viable/strict/1763054703 -> viable/strict/1763054703 2025-12-04T08:57:44.2774698Z * [new tag] viable/strict/1763067061 -> viable/strict/1763067061 2025-12-04T08:57:44.2775707Z * [new tag] viable/strict/1763070847 -> viable/strict/1763070847 2025-12-04T08:57:44.2776992Z * [new tag] viable/strict/1763072706 -> viable/strict/1763072706 2025-12-04T08:57:44.2778155Z * [new tag] viable/strict/1763076302 -> viable/strict/1763076302 2025-12-04T08:57:44.2779122Z * [new tag] viable/strict/1763080816 -> viable/strict/1763080816 2025-12-04T08:57:44.2780138Z * [new tag] viable/strict/1763082732 -> viable/strict/1763082732 2025-12-04T08:57:44.2781166Z * [new tag] viable/strict/1763085329 -> viable/strict/1763085329 2025-12-04T08:57:44.2782196Z * [new tag] viable/strict/1763088623 -> viable/strict/1763088623 2025-12-04T08:57:44.2783297Z * [new tag] viable/strict/1763091402 -> viable/strict/1763091402 2025-12-04T08:57:44.2784318Z * [new tag] viable/strict/1763092602 -> viable/strict/1763092602 2025-12-04T08:57:44.2785323Z * [new tag] viable/strict/1763094355 -> viable/strict/1763094355 2025-12-04T08:57:44.2786825Z * [new tag] viable/strict/1763099390 -> viable/strict/1763099390 2025-12-04T08:57:44.2787870Z * [new tag] viable/strict/1763101608 -> viable/strict/1763101608 2025-12-04T08:57:44.2789043Z * [new tag] viable/strict/1763105102 -> viable/strict/1763105102 2025-12-04T08:57:44.2790101Z * [new tag] viable/strict/1763112347 -> viable/strict/1763112347 2025-12-04T08:57:44.2791106Z * [new tag] viable/strict/1763119471 -> viable/strict/1763119471 2025-12-04T08:57:44.2791895Z * [new tag] viable/strict/1763126835 -> viable/strict/1763126835 2025-12-04T08:57:44.2792809Z * [new tag] viable/strict/1763149779 -> viable/strict/1763149779 2025-12-04T08:57:44.2793841Z * [new tag] viable/strict/1763164178 -> viable/strict/1763164178 2025-12-04T08:57:44.2794692Z * [new tag] viable/strict/1763167104 -> viable/strict/1763167104 2025-12-04T08:57:44.2795679Z * [new tag] viable/strict/1763169132 -> viable/strict/1763169132 2025-12-04T08:57:44.2796659Z * [new tag] viable/strict/1763171708 -> viable/strict/1763171708 2025-12-04T08:57:44.2797619Z * [new tag] viable/strict/1763174759 -> viable/strict/1763174759 2025-12-04T08:57:44.2798696Z * [new tag] viable/strict/1763180744 -> viable/strict/1763180744 2025-12-04T08:57:44.2799667Z * [new tag] viable/strict/1763182227 -> viable/strict/1763182227 2025-12-04T08:57:44.2800624Z * [new tag] viable/strict/1763184309 -> viable/strict/1763184309 2025-12-04T08:57:44.2802126Z * [new tag] viable/strict/1763187991 -> viable/strict/1763187991 2025-12-04T08:57:44.2803097Z * [new tag] viable/strict/1763191445 -> viable/strict/1763191445 2025-12-04T08:57:44.2804318Z * [new tag] viable/strict/1763195152 -> viable/strict/1763195152 2025-12-04T08:57:44.2805079Z * [new tag] viable/strict/1763205769 -> viable/strict/1763205769 2025-12-04T08:57:44.2806219Z * [new tag] viable/strict/1763246990 -> viable/strict/1763246990 2025-12-04T08:57:44.2807317Z * [new tag] viable/strict/1763261578 -> viable/strict/1763261578 2025-12-04T08:57:44.2808138Z * [new tag] viable/strict/1763286573 -> viable/strict/1763286573 2025-12-04T08:57:44.2809036Z * [new tag] viable/strict/1763292167 -> viable/strict/1763292167 2025-12-04T08:57:44.2810035Z * [new tag] viable/strict/1763333386 -> viable/strict/1763333386 2025-12-04T08:57:44.2811018Z * [new tag] viable/strict/1763340082 -> viable/strict/1763340082 2025-12-04T08:57:44.2812772Z * [new tag] viable/strict/1763364324 -> viable/strict/1763364324 2025-12-04T08:57:44.2813805Z * [new tag] viable/strict/1763371569 -> viable/strict/1763371569 2025-12-04T08:57:44.2814835Z * [new tag] viable/strict/1763373067 -> viable/strict/1763373067 2025-12-04T08:57:44.2815764Z * [new tag] viable/strict/1763375157 -> viable/strict/1763375157 2025-12-04T08:57:44.2817061Z * [new tag] viable/strict/1763382462 -> viable/strict/1763382462 2025-12-04T08:57:44.2818117Z * [new tag] viable/strict/1763394661 -> viable/strict/1763394661 2025-12-04T08:57:44.2819384Z * [new tag] viable/strict/1763396797 -> viable/strict/1763396797 2025-12-04T08:57:44.2820438Z * [new tag] viable/strict/1763398542 -> viable/strict/1763398542 2025-12-04T08:57:44.2821711Z * [new tag] viable/strict/1763401807 -> viable/strict/1763401807 2025-12-04T08:57:44.2822571Z * [new tag] viable/strict/1763414698 -> viable/strict/1763414698 2025-12-04T08:57:44.2823688Z * [new tag] viable/strict/1763419807 -> viable/strict/1763419807 2025-12-04T08:57:44.2824723Z * [new tag] viable/strict/1763426369 -> viable/strict/1763426369 2025-12-04T08:57:44.2825757Z * [new tag] viable/strict/1763428331 -> viable/strict/1763428331 2025-12-04T08:57:44.2826826Z * [new tag] viable/strict/1763430922 -> viable/strict/1763430922 2025-12-04T08:57:44.2827648Z * [new tag] viable/strict/1763434184 -> viable/strict/1763434184 2025-12-04T08:57:44.2828692Z * [new tag] viable/strict/1763439973 -> viable/strict/1763439973 2025-12-04T08:57:44.2829775Z * [new tag] viable/strict/1763444995 -> viable/strict/1763444995 2025-12-04T08:57:44.2830876Z * [new tag] viable/strict/1763447206 -> viable/strict/1763447206 2025-12-04T08:57:44.2831913Z * [new tag] viable/strict/1763448826 -> viable/strict/1763448826 2025-12-04T08:57:44.2832944Z * [new tag] viable/strict/1763450717 -> viable/strict/1763450717 2025-12-04T08:57:44.2834005Z * [new tag] viable/strict/1763452183 -> viable/strict/1763452183 2025-12-04T08:57:44.2835116Z * [new tag] viable/strict/1763457945 -> viable/strict/1763457945 2025-12-04T08:57:44.2836104Z * [new tag] viable/strict/1763459439 -> viable/strict/1763459439 2025-12-04T08:57:44.2837129Z * [new tag] viable/strict/1763461556 -> viable/strict/1763461556 2025-12-04T08:57:44.2838086Z * [new tag] viable/strict/1763463103 -> viable/strict/1763463103 2025-12-04T08:57:44.2839109Z * [new tag] viable/strict/1763465100 -> viable/strict/1763465100 2025-12-04T08:57:44.2840076Z * [new tag] viable/strict/1763468866 -> viable/strict/1763468866 2025-12-04T08:57:44.2840822Z * [new tag] viable/strict/1763493823 -> viable/strict/1763493823 2025-12-04T08:57:44.2841651Z * [new tag] viable/strict/1763496249 -> viable/strict/1763496249 2025-12-04T08:57:44.2842690Z * [new tag] viable/strict/1763502620 -> viable/strict/1763502620 2025-12-04T08:57:44.2843812Z * [new tag] viable/strict/1763504715 -> viable/strict/1763504715 2025-12-04T08:57:44.2844774Z * [new tag] viable/strict/1763506208 -> viable/strict/1763506208 2025-12-04T08:57:44.2845773Z * [new tag] viable/strict/1763520590 -> viable/strict/1763520590 2025-12-04T08:57:44.2846778Z * [new tag] viable/strict/1763523357 -> viable/strict/1763523357 2025-12-04T08:57:44.2847866Z * [new tag] viable/strict/1763529922 -> viable/strict/1763529922 2025-12-04T08:57:44.2848906Z * [new tag] viable/strict/1763531408 -> viable/strict/1763531408 2025-12-04T08:57:44.2849861Z * [new tag] viable/strict/1763533622 -> viable/strict/1763533622 2025-12-04T08:57:44.2850844Z * [new tag] viable/strict/1763538576 -> viable/strict/1763538576 2025-12-04T08:57:44.2851994Z * [new tag] viable/strict/1763545823 -> viable/strict/1763545823 2025-12-04T08:57:44.2853148Z * [new tag] viable/strict/1763547951 -> viable/strict/1763547951 2025-12-04T08:57:44.2854191Z * [new tag] viable/strict/1763551477 -> viable/strict/1763551477 2025-12-04T08:57:44.2855176Z * [new tag] viable/strict/1763552982 -> viable/strict/1763552982 2025-12-04T08:57:44.2856142Z * [new tag] viable/strict/1763594698 -> viable/strict/1763594698 2025-12-04T08:57:44.2857526Z * [new tag] viable/strict/1763596178 -> viable/strict/1763596178 2025-12-04T08:57:44.2858561Z * [new tag] viable/strict/1763599155 -> viable/strict/1763599155 2025-12-04T08:57:44.2859565Z * [new tag] viable/strict/1763603717 -> viable/strict/1763603717 2025-12-04T08:57:44.2860598Z * [new tag] viable/strict/1763606923 -> viable/strict/1763606923 2025-12-04T08:57:44.2861643Z * [new tag] viable/strict/1763609715 -> viable/strict/1763609715 2025-12-04T08:57:44.2862628Z * [new tag] viable/strict/1763612757 -> viable/strict/1763612757 2025-12-04T08:57:44.2863622Z * [new tag] viable/strict/1763616325 -> viable/strict/1763616325 2025-12-04T08:57:44.2864616Z * [new tag] viable/strict/1763623509 -> viable/strict/1763623509 2025-12-04T08:57:44.2865866Z * [new tag] viable/strict/1763624984 -> viable/strict/1763624984 2025-12-04T08:57:44.2866854Z * [new tag] viable/strict/1763628796 -> viable/strict/1763628796 2025-12-04T08:57:44.2868050Z * [new tag] viable/strict/1763634343 -> viable/strict/1763634343 2025-12-04T08:57:44.2868860Z * [new tag] viable/strict/1763635867 -> viable/strict/1763635867 2025-12-04T08:57:44.2870056Z * [new tag] viable/strict/1763639382 -> viable/strict/1763639382 2025-12-04T08:57:44.2871054Z * [new tag] viable/strict/1763646626 -> viable/strict/1763646626 2025-12-04T08:57:44.2872162Z * [new tag] viable/strict/1763655997 -> viable/strict/1763655997 2025-12-04T08:57:44.2873294Z * [new tag] viable/strict/1763659444 -> viable/strict/1763659444 2025-12-04T08:57:44.2874247Z * [new tag] viable/strict/1763660992 -> viable/strict/1763660992 2025-12-04T08:57:44.2875180Z * [new tag] viable/strict/1763663201 -> viable/strict/1763663201 2025-12-04T08:57:44.2876203Z * [new tag] viable/strict/1763670362 -> viable/strict/1763670362 2025-12-04T08:57:44.2876986Z * [new tag] viable/strict/1763675378 -> viable/strict/1763675378 2025-12-04T08:57:44.2878012Z * [new tag] viable/strict/1763693343 -> viable/strict/1763693343 2025-12-04T08:57:44.2878947Z * [new tag] viable/strict/1763696088 -> viable/strict/1763696088 2025-12-04T08:57:44.2880099Z * [new tag] viable/strict/1763697343 -> viable/strict/1763697343 2025-12-04T08:57:44.2881060Z * [new tag] viable/strict/1763699165 -> viable/strict/1763699165 2025-12-04T08:57:44.2882019Z * [new tag] viable/strict/1763700660 -> viable/strict/1763700660 2025-12-04T08:57:44.2882969Z * [new tag] viable/strict/1763704209 -> viable/strict/1763704209 2025-12-04T08:57:44.2883985Z * [new tag] viable/strict/1763706411 -> viable/strict/1763706411 2025-12-04T08:57:44.2884927Z * [new tag] viable/strict/1763708082 -> viable/strict/1763708082 2025-12-04T08:57:44.2885738Z * [new tag] viable/strict/1763711381 -> viable/strict/1763711381 2025-12-04T08:57:44.2886660Z * [new tag] viable/strict/1763713593 -> viable/strict/1763713593 2025-12-04T08:57:44.2887690Z * [new tag] viable/strict/1763715201 -> viable/strict/1763715201 2025-12-04T08:57:44.2888632Z * [new tag] viable/strict/1763733017 -> viable/strict/1763733017 2025-12-04T08:57:44.2889646Z * [new tag] viable/strict/1763735108 -> viable/strict/1763735108 2025-12-04T08:57:44.2890604Z * [new tag] viable/strict/1763749579 -> viable/strict/1763749579 2025-12-04T08:57:44.2891577Z * [new tag] viable/strict/1763751113 -> viable/strict/1763751113 2025-12-04T08:57:44.2892558Z * [new tag] viable/strict/1763753035 -> viable/strict/1763753035 2025-12-04T08:57:44.2893549Z * [new tag] viable/strict/1763754578 -> viable/strict/1763754578 2025-12-04T08:57:44.2894567Z * [new tag] viable/strict/1763756748 -> viable/strict/1763756748 2025-12-04T08:57:44.2895525Z * [new tag] viable/strict/1763758205 -> viable/strict/1763758205 2025-12-04T08:57:44.2896372Z * [new tag] viable/strict/1763764050 -> viable/strict/1763764050 2025-12-04T08:57:44.2897644Z * [new tag] viable/strict/1763771887 -> viable/strict/1763771887 2025-12-04T08:57:44.2898811Z * [new tag] viable/strict/1763773920 -> viable/strict/1763773920 2025-12-04T08:57:44.2899809Z * [new tag] viable/strict/1763776501 -> viable/strict/1763776501 2025-12-04T08:57:44.2900768Z * [new tag] viable/strict/1763779437 -> viable/strict/1763779437 2025-12-04T08:57:44.2902008Z * [new tag] viable/strict/1763781038 -> viable/strict/1763781038 2025-12-04T08:57:44.2902771Z * [new tag] viable/strict/1763782245 -> viable/strict/1763782245 2025-12-04T08:57:44.2903968Z * [new tag] viable/strict/1763785568 -> viable/strict/1763785568 2025-12-04T08:57:44.2905021Z * [new tag] viable/strict/1763787006 -> viable/strict/1763787006 2025-12-04T08:57:44.2906110Z * [new tag] viable/strict/1763789103 -> viable/strict/1763789103 2025-12-04T08:57:44.2907070Z * [new tag] viable/strict/1763790578 -> viable/strict/1763790578 2025-12-04T08:57:44.2908075Z * [new tag] viable/strict/1763796275 -> viable/strict/1763796275 2025-12-04T08:57:44.2909456Z * [new tag] viable/strict/1763801465 -> viable/strict/1763801465 2025-12-04T08:57:44.2910405Z * [new tag] viable/strict/1763803522 -> viable/strict/1763803522 2025-12-04T08:57:44.2911351Z * [new tag] viable/strict/1763808581 -> viable/strict/1763808581 2025-12-04T08:57:44.2912341Z * [new tag] viable/strict/1763840977 -> viable/strict/1763840977 2025-12-04T08:57:44.2913284Z * [new tag] viable/strict/1763846659 -> viable/strict/1763846659 2025-12-04T08:57:44.2914238Z * [new tag] viable/strict/1763872065 -> viable/strict/1763872065 2025-12-04T08:57:44.2915306Z * [new tag] viable/strict/1763873648 -> viable/strict/1763873648 2025-12-04T08:57:44.2916307Z * [new tag] viable/strict/1763875506 -> viable/strict/1763875506 2025-12-04T08:57:44.2917039Z * [new tag] viable/strict/1763889904 -> viable/strict/1763889904 2025-12-04T08:57:44.2918549Z * [new tag] viable/strict/1763930999 -> viable/strict/1763930999 2025-12-04T08:57:44.2919537Z * [new tag] viable/strict/1763944964 -> viable/strict/1763944964 2025-12-04T08:57:44.2920317Z * [new tag] viable/strict/1763958474 -> viable/strict/1763958474 2025-12-04T08:57:44.2921743Z * [new tag] viable/strict/1763967263 -> viable/strict/1763967263 2025-12-04T08:57:44.2922780Z * [new tag] viable/strict/1763972803 -> viable/strict/1763972803 2025-12-04T08:57:44.2923762Z * [new tag] viable/strict/1763976376 -> viable/strict/1763976376 2025-12-04T08:57:44.2924787Z * [new tag] viable/strict/1763989404 -> viable/strict/1763989404 2025-12-04T08:57:44.2925754Z * [new tag] viable/strict/1763990887 -> viable/strict/1763990887 2025-12-04T08:57:44.2926742Z * [new tag] viable/strict/1764019919 -> viable/strict/1764019919 2025-12-04T08:57:44.2927776Z * [new tag] viable/strict/1764023134 -> viable/strict/1764023134 2025-12-04T08:57:44.2928579Z * [new tag] viable/strict/1764024593 -> viable/strict/1764024593 2025-12-04T08:57:44.2929588Z * [new tag] viable/strict/1764026706 -> viable/strict/1764026706 2025-12-04T08:57:44.2930893Z * [new tag] viable/strict/1764031139 -> viable/strict/1764031139 2025-12-04T08:57:44.2931890Z * [new tag] viable/strict/1764033131 -> viable/strict/1764033131 2025-12-04T08:57:44.2932738Z * [new tag] viable/strict/1764035725 -> viable/strict/1764035725 2025-12-04T08:57:44.2933657Z * [new tag] viable/strict/1764624265 -> viable/strict/1764624265 2025-12-04T08:57:44.2934475Z * [new tag] viable/strict/1764631514 -> viable/strict/1764631514 2025-12-04T08:57:44.2935292Z * [new tag] viable/strict/1764632987 -> viable/strict/1764632987 2025-12-04T08:57:44.2936077Z * [new tag] viable/strict/1764636063 -> viable/strict/1764636063 2025-12-04T08:57:44.2937357Z * [new tag] viable/strict/1764643975 -> viable/strict/1764643975 2025-12-04T08:57:44.2938355Z * [new tag] viable/strict/1764646859 -> viable/strict/1764646859 2025-12-04T08:57:44.2939133Z * [new tag] viable/strict/1764653120 -> viable/strict/1764653120 2025-12-04T08:57:44.2940060Z * [new tag] viable/strict/1764654632 -> viable/strict/1764654632 2025-12-04T08:57:44.2940782Z * [new tag] viable/strict/1764656821 -> viable/strict/1764656821 2025-12-04T08:57:44.2941618Z * [new tag] viable/strict/1764658557 -> viable/strict/1764658557 2025-12-04T08:57:44.2942425Z * [new tag] viable/strict/1764660333 -> viable/strict/1764660333 2025-12-04T08:57:44.2943246Z * [new tag] viable/strict/1764661812 -> viable/strict/1764661812 2025-12-04T08:57:44.2944047Z * [new tag] viable/strict/1764664023 -> viable/strict/1764664023 2025-12-04T08:57:44.2944876Z * [new tag] viable/strict/1764669150 -> viable/strict/1764669150 2025-12-04T08:57:44.2945692Z * [new tag] viable/strict/1764680709 -> viable/strict/1764680709 2025-12-04T08:57:44.2946499Z * [new tag] viable/strict/1764687619 -> viable/strict/1764687619 2025-12-04T08:57:44.2947337Z * [new tag] viable/strict/1764696355 -> viable/strict/1764696355 2025-12-04T08:57:44.2948156Z * [new tag] viable/strict/1764701767 -> viable/strict/1764701767 2025-12-04T08:57:44.2949113Z * [new tag] viable/strict/1764710768 -> viable/strict/1764710768 2025-12-04T08:57:44.2949905Z * [new tag] viable/strict/1764716202 -> viable/strict/1764716202 2025-12-04T08:57:44.2950714Z * [new tag] viable/strict/1764793566 -> viable/strict/1764793566 2025-12-04T08:57:44.2951515Z * [new tag] viable/strict/1764797093 -> viable/strict/1764797093 2025-12-04T08:57:44.2952416Z * [new tag] viable/strict/1764800729 -> viable/strict/1764800729 2025-12-04T08:57:44.2953336Z * [new tag] whc_flight_1 -> whc_flight_1 2025-12-04T08:57:44.2954698Z * [new tag] whc_flight_2 -> whc_flight_2 2025-12-04T08:57:44.2955855Z * [new tag] whc_flight_4 -> whc_flight_4 2025-12-04T08:57:44.3634557Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object} 2025-12-04T08:57:44.3658485Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:57:44.3661878Z ##[endgroup] 2025-12-04T08:57:44.3662228Z ##[group]Determining the checkout info 2025-12-04T08:57:44.3662955Z ##[endgroup] 2025-12-04T08:57:44.3666846Z [command]/usr/bin/git sparse-checkout disable 2025-12-04T08:57:44.3697690Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-12-04T08:57:44.3723481Z ##[group]Checking out the ref 2025-12-04T08:57:44.3727002Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:57:45.4258713Z Updating files: 80% (16107/20121) 2025-12-04T08:57:45.4553593Z Updating files: 81% (16299/20121) 2025-12-04T08:57:45.4771401Z Updating files: 82% (16500/20121) 2025-12-04T08:57:45.4921920Z Updating files: 83% (16701/20121) 2025-12-04T08:57:45.5058242Z Updating files: 84% (16902/20121) 2025-12-04T08:57:45.5218494Z Updating files: 85% (17103/20121) 2025-12-04T08:57:45.5377368Z Updating files: 86% (17305/20121) 2025-12-04T08:57:45.5514750Z Updating files: 87% (17506/20121) 2025-12-04T08:57:45.5624777Z Updating files: 88% (17707/20121) 2025-12-04T08:57:45.5765262Z Updating files: 89% (17908/20121) 2025-12-04T08:57:45.5938677Z Updating files: 90% (18109/20121) 2025-12-04T08:57:45.6053441Z Updating files: 91% (18311/20121) 2025-12-04T08:57:45.6207781Z Updating files: 92% (18512/20121) 2025-12-04T08:57:45.6392744Z Updating files: 93% (18713/20121) 2025-12-04T08:57:45.6600713Z Updating files: 94% (18914/20121) 2025-12-04T08:57:45.6777519Z Updating files: 95% (19115/20121) 2025-12-04T08:57:45.6932061Z Updating files: 96% (19317/20121) 2025-12-04T08:57:45.7098211Z Updating files: 97% (19518/20121) 2025-12-04T08:57:45.7392404Z Updating files: 98% (19719/20121) 2025-12-04T08:57:45.7566870Z Updating files: 99% (19920/20121) 2025-12-04T08:57:45.7567390Z Updating files: 100% (20121/20121) 2025-12-04T08:57:45.7568044Z Updating files: 100% (20121/20121), done. 2025-12-04T08:57:45.7853750Z Note: switching to 'ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32'. 2025-12-04T08:57:45.7854301Z 2025-12-04T08:57:45.7854622Z You are in 'detached HEAD' state. You can look around, make experimental 2025-12-04T08:57:45.7855272Z changes and commit them, and you can discard any commits you make in this 2025-12-04T08:57:45.7855889Z state without impacting any branches by switching back to a branch. 2025-12-04T08:57:45.7856267Z 2025-12-04T08:57:45.7856622Z If you want to create a new branch to retain commits you create, you may 2025-12-04T08:57:45.7857385Z do so (now or later) by using -c with the switch command. Example: 2025-12-04T08:57:45.7857724Z 2025-12-04T08:57:45.7857867Z git switch -c 2025-12-04T08:57:45.7858096Z 2025-12-04T08:57:45.7858222Z Or undo this operation with: 2025-12-04T08:57:45.7858440Z 2025-12-04T08:57:45.7858543Z git switch - 2025-12-04T08:57:45.7858705Z 2025-12-04T08:57:45.7858997Z Turn off this advice by setting config variable advice.detachedHead to false 2025-12-04T08:57:45.7859414Z 2025-12-04T08:57:45.7859738Z HEAD is now at ffd9b0fb435 Resolve collective autotuning test failure on arm (#168919) 2025-12-04T08:57:45.7939667Z ##[endgroup] 2025-12-04T08:57:45.7940193Z ##[group]Setting up auth for fetching submodules 2025-12-04T08:57:45.7946335Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T08:57:45.7998379Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-12-04T08:57:45.8032058Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-12-04T08:57:45.8058436Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-12-04T08:57:45.8083017Z ##[endgroup] 2025-12-04T08:57:45.8083498Z ##[group]Fetching submodules 2025-12-04T08:57:45.8086037Z [command]/usr/bin/git submodule sync --recursive 2025-12-04T08:57:45.8412687Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-12-04T08:57:45.8732352Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni' 2025-12-04T08:57:45.8734009Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16' 2025-12-04T08:57:45.8736875Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv' 2025-12-04T08:57:45.8739617Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK' 2025-12-04T08:57:45.8742247Z Submodule 'third_party/NVTX' (https://github.com/NVIDIA/NVTX.git) registered for path 'third_party/NVTX' 2025-12-04T08:57:45.8745567Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator' 2025-12-04T08:57:45.8748172Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' 2025-12-04T08:57:45.8751322Z Submodule 'third_party/aiter' (https://github.com/ROCm/aiter.git) registered for path 'third_party/aiter' 2025-12-04T08:57:45.8754511Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark' 2025-12-04T08:57:45.8757923Z Submodule 'third_party/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel' 2025-12-04T08:57:45.8761140Z Submodule 'third_party/cpp-httplib' (https://github.com/yhirose/cpp-httplib.git) registered for path 'third_party/cpp-httplib' 2025-12-04T08:57:45.8764479Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo' 2025-12-04T08:57:45.8768784Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' 2025-12-04T08:57:45.8772089Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass' 2025-12-04T08:57:45.8775777Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm' 2025-12-04T08:57:45.8780283Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention' 2025-12-04T08:57:45.8785933Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers' 2025-12-04T08:57:45.8790159Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' 2025-12-04T08:57:45.8794457Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:57:45.8798522Z Submodule 'third_party/gloo' (https://github.com/pytorch/gloo) registered for path 'third_party/gloo' 2025-12-04T08:57:45.8802998Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest' 2025-12-04T08:57:45.8807317Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep' 2025-12-04T08:57:45.8811848Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi' 2025-12-04T08:57:45.8816468Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto' 2025-12-04T08:57:45.8822065Z Submodule 'third_party/kleidiai' (https://github.com/ARM-software/kleidiai.git) registered for path 'third_party/kleidiai' 2025-12-04T08:57:45.8827090Z Submodule 'third_party/mimalloc' (https://github.com/microsoft/mimalloc.git) registered for path 'third_party/mimalloc' 2025-12-04T08:57:45.8832149Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann' 2025-12-04T08:57:45.8837181Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx' 2025-12-04T08:57:45.8842618Z Submodule 'third_party/opentelemetry-cpp' (https://github.com/open-telemetry/opentelemetry-cpp.git) registered for path 'third_party/opentelemetry-cpp' 2025-12-04T08:57:45.8847590Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' 2025-12-04T08:57:45.8852958Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf' 2025-12-04T08:57:45.8858771Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd' 2025-12-04T08:57:45.8864682Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool' 2025-12-04T08:57:45.8871913Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11' 2025-12-04T08:57:45.8877790Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy' 2025-12-04T08:57:45.8883489Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef' 2025-12-04T08:57:45.8889498Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe' 2025-12-04T08:57:45.8922727Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'... 2025-12-04T08:57:46.1019890Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'... 2025-12-04T08:57:46.1020943Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'... 2025-12-04T08:57:46.1021983Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'... 2025-12-04T08:57:46.1047931Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'... 2025-12-04T08:57:46.1051699Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'... 2025-12-04T08:57:46.1159583Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NVTX'... 2025-12-04T08:57:46.4637301Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'... 2025-12-04T08:57:46.4639059Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'... 2025-12-04T08:57:46.4640774Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'... 2025-12-04T08:57:46.4642214Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'... 2025-12-04T08:57:46.4643877Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'... 2025-12-04T08:57:46.4645700Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'... 2025-12-04T08:57:46.4647423Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'... 2025-12-04T08:57:46.4649078Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'... 2025-12-04T08:57:46.4650713Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kleidiai'... 2025-12-04T08:57:46.5578899Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'... 2025-12-04T08:57:47.7412032Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpp-httplib'... 2025-12-04T08:57:47.7413089Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention'... 2025-12-04T08:57:47.7414140Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'... 2025-12-04T08:57:47.7415074Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'... 2025-12-04T08:57:47.7416134Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'... 2025-12-04T08:57:47.7417236Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/mimalloc'... 2025-12-04T08:57:47.7418141Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'... 2025-12-04T08:57:47.7419026Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'... 2025-12-04T08:57:47.7420091Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'... 2025-12-04T08:57:47.7421392Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'... 2025-12-04T08:57:47.7786745Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'... 2025-12-04T08:57:59.4369956Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'... 2025-12-04T08:57:59.4370947Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'... 2025-12-04T08:57:59.4371765Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'... 2025-12-04T08:57:59.4372548Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'... 2025-12-04T08:57:59.4373442Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/composable_kernel'... 2025-12-04T08:57:59.4374421Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'... 2025-12-04T08:57:59.4375271Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp'... 2025-12-04T08:57:59.4376103Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'... 2025-12-04T08:57:59.5372245Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter'... 2025-12-04T08:58:02.1686492Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-12-04T08:58:02.1814277Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-12-04T08:58:02.1915408Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-12-04T08:58:02.2174470Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-12-04T08:58:02.3052915Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6' 2025-12-04T08:58:02.3649451Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-12-04T08:58:03.1339229Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-12-04T08:58:03.3314113Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-12-04T08:58:03.3337121Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:58:03.3365066Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter/3rdparty/composable_kernel'... 2025-12-04T08:58:08.1229875Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-12-04T08:58:08.1479872Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-12-04T08:58:08.5192847Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T08:58:08.5731332Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-12-04T08:58:08.6754533Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc' 2025-12-04T08:58:08.7259379Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396' 2025-12-04T08:58:09.4137788Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588' 2025-12-04T08:58:09.5760808Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4' 2025-12-04T08:58:09.5783186Z Submodule 'external/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/external/asmjit' 2025-12-04T08:58:09.5785385Z Submodule 'external/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:58:09.5787878Z Submodule 'external/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:58:09.5790902Z Submodule 'external/cutlass' (https://github.com/jwfromm/cutlass) registered for path 'third_party/fbgemm/external/cutlass' 2025-12-04T08:58:09.5793650Z Submodule 'external/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/external/googletest' 2025-12-04T08:58:09.5796605Z Submodule 'external/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:58:09.5799311Z Submodule 'external/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/fbgemm/external/json' 2025-12-04T08:58:09.5829125Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/asmjit'... 2025-12-04T08:58:10.7865441Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/hipify_torch'... 2025-12-04T08:58:10.7866518Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cpuinfo'... 2025-12-04T08:58:10.7867543Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/googletest'... 2025-12-04T08:58:10.8866285Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/composable_kernel'... 2025-12-04T08:58:13.9714636Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cutlass'... 2025-12-04T08:58:14.0715858Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/json'... 2025-12-04T08:58:16.5223546Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-12-04T08:58:16.9074065Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T08:58:17.0181595Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-12-04T08:58:17.7150776Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8' 2025-12-04T08:58:17.7659993Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:58:17.7786419Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-12-04T08:58:17.8885218Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-12-04T08:58:17.9625810Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-12-04T08:58:17.9645217Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:58:17.9646752Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:58:17.9675557Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/composable_kernel'... 2025-12-04T08:58:22.3417969Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/cutlass'... 2025-12-04T08:58:22.5932732Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-12-04T08:58:23.1875801Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-12-04T08:58:23.3319298Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-12-04T08:58:23.3638608Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f' 2025-12-04T08:58:23.4065158Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-12-04T08:58:23.4323781Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341' 2025-12-04T08:58:23.4793244Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:58:23.4932683Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-12-04T08:58:23.4950631Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn' 2025-12-04T08:58:23.4978026Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'... 2025-12-04T08:58:38.6240872Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-12-04T08:58:38.6452527Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-12-04T08:58:38.7473995Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943' 2025-12-04T08:58:38.7493165Z Submodule 'libkineto/third_party/dynolog' (https://github.com/facebookincubator/dynolog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:58:38.7494661Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:58:38.7497197Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:58:38.7526411Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog'... 2025-12-04T08:58:39.4004128Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'... 2025-12-04T08:58:39.8075421Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'... 2025-12-04T08:58:39.9028066Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1' 2025-12-04T08:58:39.9047902Z Submodule 'third_party/DCGM' (https://github.com/NVIDIA/DCGM.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:58:39.9049376Z Submodule 'third_party/cpr' (https://github.com/libcpr/cpr.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:58:39.9050815Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:58:39.9053871Z Submodule 'third_party/gflags' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:58:39.9057032Z Submodule 'third_party/glog' (https://github.com/google/glog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:58:39.9060361Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:58:39.9063518Z Submodule 'third_party/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:58:39.9066809Z Submodule 'third_party/pfs' (https://github.com/dtrugman/pfs.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:58:39.9070467Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:58:39.9100202Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'... 2025-12-04T08:58:42.1730684Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'... 2025-12-04T08:58:42.1732097Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'... 2025-12-04T08:58:42.1733617Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'... 2025-12-04T08:58:42.1734999Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'... 2025-12-04T08:58:42.1736366Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/glog'... 2025-12-04T08:58:42.1737902Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'... 2025-12-04T08:58:42.1739275Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'... 2025-12-04T08:58:42.2731549Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json'... 2025-12-04T08:58:46.9478287Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-12-04T08:58:46.9679569Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-12-04T08:58:47.0064340Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-12-04T08:58:47.0207985Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-12-04T08:58:47.0224803Z Submodule 'doc' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:58:47.0253901Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'... 2025-12-04T08:58:47.3116064Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-12-04T08:58:47.3310612Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-12-04T08:58:47.3778985Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:58:47.4850165Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-12-04T08:58:47.5026981Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-12-04T08:58:47.5207665Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a' 2025-12-04T08:58:47.5224450Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:47.5227114Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:47.5256148Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'... 2025-12-04T08:58:49.5386822Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'... 2025-12-04T08:58:49.8052398Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159' 2025-12-04T08:58:49.8544853Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T08:58:49.8876291Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-12-04T08:58:49.9346662Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:58:49.9893914Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe' 2025-12-04T08:58:50.0295897Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-12-04T08:58:50.1340528Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-12-04T08:58:50.5523918Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-12-04T08:58:50.5564010Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11' 2025-12-04T08:58:50.5592468Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'... 2025-12-04T08:58:51.3267232Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-12-04T08:58:51.3983789Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-12-04T08:58:51.4002873Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark) registered for path 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:58:51.4005095Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:58:51.4007646Z Submodule 'third_party/ms-gsl' (https://github.com/microsoft/GSL) registered for path 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:58:51.4010446Z Submodule 'third_party/nlohmann-json' (https://github.com/nlohmann/json) registered for path 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:58:51.4013839Z Submodule 'third_party/opentelemetry-proto' (https://github.com/open-telemetry/opentelemetry-proto) registered for path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:58:51.4016584Z Submodule 'third_party/opentracing-cpp' (https://github.com/opentracing/opentracing-cpp.git) registered for path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:58:51.4020014Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:58:51.4023369Z Submodule 'tools/vcpkg' (https://github.com/Microsoft/vcpkg) registered for path 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:58:51.4050730Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/benchmark'... 2025-12-04T08:58:51.7810787Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp'... 2025-12-04T08:58:51.7812149Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto'... 2025-12-04T08:58:51.7813413Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp'... 2025-12-04T08:58:51.7814592Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl'... 2025-12-04T08:58:51.8812424Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/googletest'... 2025-12-04T08:58:52.3620685Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/nlohmann-json'... 2025-12-04T08:58:58.5099627Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/tools/vcpkg'... 2025-12-04T08:58:59.2463264Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-12-04T08:58:59.2891946Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-12-04T08:58:59.3067015Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-12-04T08:58:59.4139732Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-12-04T08:58:59.4285437Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-12-04T08:58:59.4437703Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-12-04T08:58:59.4617620Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-12-04T08:58:59.4636444Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:58:59.4638600Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:58:59.4664769Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'... 2025-12-04T08:59:01.3029310Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'... 2025-12-04T08:59:01.5668715Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-12-04T08:59:01.6166442Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T08:59:02.0979502Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-12-04T08:59:02.1102239Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-12-04T08:59:02.3932146Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-12-04T08:59:02.3957529Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:59:02.3959081Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest' 2025-12-04T08:59:02.3987523Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'... 2025-12-04T08:59:02.9039490Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'... 2025-12-04T08:59:03.2354561Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-12-04T08:59:03.3104924Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-12-04T08:59:03.3203431Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-12-04T08:59:03.3326597Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-12-04T08:59:03.3767782Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-12-04T08:59:03.4073946Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-12-04T08:59:03.4518902Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-12-04T08:59:03.4796065Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d' 2025-12-04T08:59:03.4814255Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:59:03.4815570Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:59:03.4818336Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:59:03.4821214Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:59:03.4848737Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'... 2025-12-04T08:59:04.3534754Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'... 2025-12-04T08:59:04.3536013Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'... 2025-12-04T08:59:04.4296538Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'... 2025-12-04T08:59:04.4898495Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-12-04T08:59:04.5060718Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-12-04T08:59:04.5836674Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-12-04T08:59:04.6138455Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-12-04T08:59:04.6156033Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:59:04.6190724Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'... 2025-12-04T08:59:04.7913579Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-12-04T08:59:04.7951098Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-12-04T08:59:04.8276873Z Entering 'android/libs/fbjni' 2025-12-04T08:59:04.8320163Z Entering 'third_party/FP16' 2025-12-04T08:59:04.8364147Z Entering 'third_party/FXdiv' 2025-12-04T08:59:04.8408962Z Entering 'third_party/NNPACK' 2025-12-04T08:59:04.8453961Z Entering 'third_party/NVTX' 2025-12-04T08:59:04.8498687Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:59:04.8542053Z Entering 'third_party/XNNPACK' 2025-12-04T08:59:04.8601505Z Entering 'third_party/aiter' 2025-12-04T08:59:04.8646534Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:59:04.8698851Z Entering 'third_party/benchmark' 2025-12-04T08:59:04.8741622Z Entering 'third_party/composable_kernel' 2025-12-04T08:59:04.8795761Z Entering 'third_party/cpp-httplib' 2025-12-04T08:59:04.8840364Z Entering 'third_party/cpuinfo' 2025-12-04T08:59:04.8885171Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:59:04.8929736Z Entering 'third_party/cutlass' 2025-12-04T08:59:04.8984051Z Entering 'third_party/fbgemm' 2025-12-04T08:59:04.9030998Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:59:04.9075547Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:59:04.9125397Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:59:04.9173972Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:59:04.9235762Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:59:04.9278045Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:59:04.9319692Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:59:04.9367465Z Entering 'third_party/flash-attention' 2025-12-04T08:59:04.9411410Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:59:04.9460253Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:59:04.9515336Z Entering 'third_party/flatbuffers' 2025-12-04T08:59:04.9563219Z Entering 'third_party/fmt' 2025-12-04T08:59:04.9607967Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:59:04.9652547Z Entering 'third_party/gloo' 2025-12-04T08:59:04.9696526Z Entering 'third_party/googletest' 2025-12-04T08:59:04.9739982Z Entering 'third_party/ideep' 2025-12-04T08:59:04.9781299Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:59:04.9835382Z Entering 'third_party/ittapi' 2025-12-04T08:59:04.9877443Z Entering 'third_party/kineto' 2025-12-04T08:59:04.9924129Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:59:04.9975239Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:59:05.0018621Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:59:05.0059792Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:59:05.0103298Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:59:05.0145250Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:59:05.0194455Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:59:05.0236737Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:59:05.0279390Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:59:05.0322598Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:59:05.0367184Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:59:05.0417044Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:05.0461205Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:05.0507120Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:59:05.0556088Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:59:05.0598601Z Entering 'third_party/kleidiai' 2025-12-04T08:59:05.0642589Z Entering 'third_party/mimalloc' 2025-12-04T08:59:05.0687464Z Entering 'third_party/nlohmann' 2025-12-04T08:59:05.0733307Z Entering 'third_party/onnx' 2025-12-04T08:59:05.0797821Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:59:05.0842816Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:59:05.0889812Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:59:05.0933478Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:59:05.0975556Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:59:05.1017383Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:59:05.1058713Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:59:05.1099188Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:59:05.1140528Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:59:05.1184589Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:05.1230968Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:05.1281687Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:59:05.1344526Z Entering 'third_party/pocketfft' 2025-12-04T08:59:05.1389126Z Entering 'third_party/protobuf' 2025-12-04T08:59:05.1436842Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:59:05.1477614Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:59:05.1522598Z Entering 'third_party/psimd' 2025-12-04T08:59:05.1571080Z Entering 'third_party/pthreadpool' 2025-12-04T08:59:05.1615111Z Entering 'third_party/pybind11' 2025-12-04T08:59:05.1658733Z Entering 'third_party/python-peachpy' 2025-12-04T08:59:05.1701072Z Entering 'third_party/sleef' 2025-12-04T08:59:05.1745054Z Entering 'third_party/tensorpipe' 2025-12-04T08:59:05.1789332Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:59:05.1830584Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:59:05.1873863Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:59:05.1915019Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:59:05.1957078Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:59:05.2013384Z ##[endgroup] 2025-12-04T08:59:05.2015169Z ##[group]Persisting credentials for submodules 2025-12-04T08:59:05.2020569Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-12-04T08:59:05.2329800Z Entering 'android/libs/fbjni' 2025-12-04T08:59:05.2390471Z Entering 'third_party/FP16' 2025-12-04T08:59:05.2458203Z Entering 'third_party/FXdiv' 2025-12-04T08:59:05.2518901Z Entering 'third_party/NNPACK' 2025-12-04T08:59:05.2578080Z Entering 'third_party/NVTX' 2025-12-04T08:59:05.2637753Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:59:05.2696685Z Entering 'third_party/XNNPACK' 2025-12-04T08:59:05.2772696Z Entering 'third_party/aiter' 2025-12-04T08:59:05.2830911Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:59:05.2898958Z Entering 'third_party/benchmark' 2025-12-04T08:59:05.2957874Z Entering 'third_party/composable_kernel' 2025-12-04T08:59:05.3028667Z Entering 'third_party/cpp-httplib' 2025-12-04T08:59:05.3087018Z Entering 'third_party/cpuinfo' 2025-12-04T08:59:05.3144921Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:59:05.3202684Z Entering 'third_party/cutlass' 2025-12-04T08:59:05.3268326Z Entering 'third_party/fbgemm' 2025-12-04T08:59:05.3329642Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:59:05.3395739Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:59:05.3464898Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:59:05.3523536Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:59:05.3592847Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:59:05.3649652Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:59:05.3705698Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:59:05.3768872Z Entering 'third_party/flash-attention' 2025-12-04T08:59:05.3830014Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:59:05.3895596Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:59:05.3970270Z Entering 'third_party/flatbuffers' 2025-12-04T08:59:05.4030399Z Entering 'third_party/fmt' 2025-12-04T08:59:05.4088969Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:59:05.4146928Z Entering 'third_party/gloo' 2025-12-04T08:59:05.4204341Z Entering 'third_party/googletest' 2025-12-04T08:59:05.4264667Z Entering 'third_party/ideep' 2025-12-04T08:59:05.4322081Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:59:05.4392355Z Entering 'third_party/ittapi' 2025-12-04T08:59:05.4454184Z Entering 'third_party/kineto' 2025-12-04T08:59:05.4513079Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:59:05.4572054Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:59:05.4628786Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:59:05.4686420Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:59:05.4744076Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:59:05.4800540Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:59:05.4859885Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:59:05.4917999Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:59:05.4974921Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:59:05.5038386Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:59:05.5095679Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:59:05.5157786Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:05.5218925Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:05.5281758Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:59:05.5338241Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:59:05.5397609Z Entering 'third_party/kleidiai' 2025-12-04T08:59:05.5458100Z Entering 'third_party/mimalloc' 2025-12-04T08:59:05.5515332Z Entering 'third_party/nlohmann' 2025-12-04T08:59:05.5575628Z Entering 'third_party/onnx' 2025-12-04T08:59:05.5654731Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:59:05.5717947Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:59:05.5778409Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:59:05.5835787Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:59:05.5897812Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:59:05.5955701Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:59:05.6017393Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:59:05.6074002Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:59:05.6138178Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:59:05.6194298Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:05.6252930Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:05.6314305Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:59:05.6398132Z Entering 'third_party/pocketfft' 2025-12-04T08:59:05.6455277Z Entering 'third_party/protobuf' 2025-12-04T08:59:05.6517094Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:59:05.6576280Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:59:05.6640844Z Entering 'third_party/psimd' 2025-12-04T08:59:05.6697633Z Entering 'third_party/pthreadpool' 2025-12-04T08:59:05.6756190Z Entering 'third_party/pybind11' 2025-12-04T08:59:05.6817909Z Entering 'third_party/python-peachpy' 2025-12-04T08:59:05.6877705Z Entering 'third_party/sleef' 2025-12-04T08:59:05.6937980Z Entering 'third_party/tensorpipe' 2025-12-04T08:59:05.6994315Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:59:05.7055751Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:59:05.7112554Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:59:05.7178165Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:59:05.7232607Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:59:05.7315439Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-12-04T08:59:05.7630618Z Entering 'android/libs/fbjni' 2025-12-04T08:59:05.7684232Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T08:59:05.7700130Z Entering 'third_party/FP16' 2025-12-04T08:59:05.7755136Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T08:59:05.7772825Z Entering 'third_party/FXdiv' 2025-12-04T08:59:05.7826693Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T08:59:05.7844966Z Entering 'third_party/NNPACK' 2025-12-04T08:59:05.7898449Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T08:59:05.7916551Z Entering 'third_party/NVTX' 2025-12-04T08:59:05.7970578Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T08:59:05.7987169Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:59:05.8042659Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T08:59:05.8058641Z Entering 'third_party/XNNPACK' 2025-12-04T08:59:05.8112765Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T08:59:05.8146655Z Entering 'third_party/aiter' 2025-12-04T08:59:05.8200257Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T08:59:05.8217973Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:59:05.8268345Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T08:59:05.8297322Z Entering 'third_party/benchmark' 2025-12-04T08:59:05.8353149Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:59:05.8371100Z Entering 'third_party/composable_kernel' 2025-12-04T08:59:05.8422050Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T08:59:05.8449215Z Entering 'third_party/cpp-httplib' 2025-12-04T08:59:05.8500108Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T08:59:05.8518101Z Entering 'third_party/cpuinfo' 2025-12-04T08:59:05.8577661Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T08:59:05.8595793Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:59:05.8647934Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T08:59:05.8663914Z Entering 'third_party/cutlass' 2025-12-04T08:59:05.8717869Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T08:59:05.8745829Z Entering 'third_party/fbgemm' 2025-12-04T08:59:05.8799995Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T08:59:05.8818405Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:59:05.8874128Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T08:59:05.8891518Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:59:05.8944170Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T08:59:05.8971679Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:59:05.9022703Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T08:59:05.9040950Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:59:05.9092982Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T08:59:05.9121401Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:59:05.9176728Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T08:59:05.9194453Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:59:05.9247038Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T08:59:05.9261729Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:59:05.9315324Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T08:59:05.9336531Z Entering 'third_party/flash-attention' 2025-12-04T08:59:05.9388138Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T08:59:05.9407292Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:59:05.9458908Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T08:59:05.9482447Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:59:05.9535627Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T08:59:05.9565070Z Entering 'third_party/flatbuffers' 2025-12-04T08:59:05.9615547Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T08:59:05.9638275Z Entering 'third_party/fmt' 2025-12-04T08:59:05.9690846Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T08:59:05.9706759Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:59:05.9762460Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T08:59:05.9778247Z Entering 'third_party/gloo' 2025-12-04T08:59:05.9830039Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T08:59:05.9848671Z Entering 'third_party/googletest' 2025-12-04T08:59:05.9899824Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:59:05.9918728Z Entering 'third_party/ideep' 2025-12-04T08:59:05.9971541Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T08:59:05.9985877Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:59:06.0039834Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T08:59:06.0063295Z Entering 'third_party/ittapi' 2025-12-04T08:59:06.0117099Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T08:59:06.0136473Z Entering 'third_party/kineto' 2025-12-04T08:59:06.0186634Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T08:59:06.0205114Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:59:06.0258571Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T08:59:06.0275945Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:59:06.0326738Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T08:59:06.0344359Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:59:06.0397905Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T08:59:06.0415577Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:59:06.0467625Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T08:59:06.0486771Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:59:06.0541393Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T08:59:06.0558790Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:59:06.0611298Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T08:59:06.0629366Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:59:06.0683403Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T08:59:06.0698556Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:59:06.0754835Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:59:06.0772387Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:59:06.0823873Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T08:59:06.0844214Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:59:06.0897730Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T08:59:06.0915417Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:59:06.0973881Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T08:59:06.0991917Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:06.1045601Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T08:59:06.1063331Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:06.1117755Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T08:59:06.1139120Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:59:06.1193256Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T08:59:06.1216697Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:59:06.1267299Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T08:59:06.1287471Z Entering 'third_party/kleidiai' 2025-12-04T08:59:06.1339542Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T08:59:06.1358809Z Entering 'third_party/mimalloc' 2025-12-04T08:59:06.1413175Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T08:59:06.1429333Z Entering 'third_party/nlohmann' 2025-12-04T08:59:06.1483760Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T08:59:06.1500983Z Entering 'third_party/onnx' 2025-12-04T08:59:06.1554674Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T08:59:06.1589512Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:59:06.1641615Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:59:06.1660260Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:59:06.1715122Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T08:59:06.1732481Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:59:06.1786720Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:59:06.1804110Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:59:06.1858719Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:59:06.1875999Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:59:06.1926578Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T08:59:06.1942206Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:59:06.1995083Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T08:59:06.2014018Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:59:06.2064687Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T08:59:06.2082514Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:59:06.2133851Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T08:59:06.2153850Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:59:06.2205872Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T08:59:06.2220412Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:06.2274088Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T08:59:06.2294052Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:06.2346945Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T08:59:06.2367327Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:59:06.2418641Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T08:59:06.2457237Z Entering 'third_party/pocketfft' 2025-12-04T08:59:06.2506832Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T08:59:06.2523628Z Entering 'third_party/protobuf' 2025-12-04T08:59:06.2577837Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T08:59:06.2597796Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:59:06.2648477Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:59:06.2664063Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:59:06.2718037Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:59:06.2739399Z Entering 'third_party/psimd' 2025-12-04T08:59:06.2795790Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T08:59:06.2813048Z Entering 'third_party/pthreadpool' 2025-12-04T08:59:06.2866638Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T08:59:06.2884872Z Entering 'third_party/pybind11' 2025-12-04T08:59:06.2938450Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:59:06.2956406Z Entering 'third_party/python-peachpy' 2025-12-04T08:59:06.3009064Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T08:59:06.3024891Z Entering 'third_party/sleef' 2025-12-04T08:59:06.3078194Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T08:59:06.3095940Z Entering 'third_party/tensorpipe' 2025-12-04T08:59:06.3148821Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T08:59:06.3168666Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:59:06.3219946Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:59:06.3238952Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:59:06.3290286Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T08:59:06.3305219Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:59:06.3360904Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T08:59:06.3378193Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:59:06.3430641Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:59:06.3447875Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:59:06.3498872Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T08:59:06.4200435Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-12-04T08:59:06.4518171Z Entering 'android/libs/fbjni' 2025-12-04T08:59:06.4562267Z Entering 'third_party/FP16' 2025-12-04T08:59:06.4608011Z Entering 'third_party/FXdiv' 2025-12-04T08:59:06.4653712Z Entering 'third_party/NNPACK' 2025-12-04T08:59:06.4697791Z Entering 'third_party/NVTX' 2025-12-04T08:59:06.4742046Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:59:06.4785770Z Entering 'third_party/XNNPACK' 2025-12-04T08:59:06.4848063Z Entering 'third_party/aiter' 2025-12-04T08:59:06.4893305Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:59:06.4946889Z Entering 'third_party/benchmark' 2025-12-04T08:59:06.4993281Z Entering 'third_party/composable_kernel' 2025-12-04T08:59:06.5045913Z Entering 'third_party/cpp-httplib' 2025-12-04T08:59:06.5089657Z Entering 'third_party/cpuinfo' 2025-12-04T08:59:06.5136961Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:59:06.5179101Z Entering 'third_party/cutlass' 2025-12-04T08:59:06.5235516Z Entering 'third_party/fbgemm' 2025-12-04T08:59:06.5280814Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:59:06.5322922Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:59:06.5377526Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:59:06.5419370Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:59:06.5468541Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:59:06.5514475Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:59:06.5558615Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:59:06.5605667Z Entering 'third_party/flash-attention' 2025-12-04T08:59:06.5650822Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:59:06.5702424Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:59:06.5758389Z Entering 'third_party/flatbuffers' 2025-12-04T08:59:06.5807060Z Entering 'third_party/fmt' 2025-12-04T08:59:06.5852866Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:59:06.5897693Z Entering 'third_party/gloo' 2025-12-04T08:59:06.5941114Z Entering 'third_party/googletest' 2025-12-04T08:59:06.5984519Z Entering 'third_party/ideep' 2025-12-04T08:59:06.6027135Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:59:06.6079293Z Entering 'third_party/ittapi' 2025-12-04T08:59:06.6122583Z Entering 'third_party/kineto' 2025-12-04T08:59:06.6167536Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:59:06.6210860Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:59:06.6263259Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:59:06.6306389Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:59:06.6351478Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:59:06.6394689Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:59:06.6439911Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:59:06.6483274Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:59:06.6527625Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:59:06.6573761Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:59:06.6617187Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:59:06.6658357Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:06.6702599Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:06.6750696Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:59:06.6792906Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:59:06.6837102Z Entering 'third_party/kleidiai' 2025-12-04T08:59:06.6882205Z Entering 'third_party/mimalloc' 2025-12-04T08:59:06.6945665Z Entering 'third_party/nlohmann' 2025-12-04T08:59:06.6991878Z Entering 'third_party/onnx' 2025-12-04T08:59:06.7053291Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:59:06.7097955Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:59:06.7142099Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:59:06.7183387Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:59:06.7224923Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:59:06.7267045Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:59:06.7315208Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:59:06.7358436Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:59:06.7401708Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:59:06.7443114Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:06.7492001Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:06.7538915Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:59:06.7603623Z Entering 'third_party/pocketfft' 2025-12-04T08:59:06.7647826Z Entering 'third_party/protobuf' 2025-12-04T08:59:06.7694328Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:59:06.7737343Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:59:06.7779248Z Entering 'third_party/psimd' 2025-12-04T08:59:06.7823047Z Entering 'third_party/pthreadpool' 2025-12-04T08:59:06.7867734Z Entering 'third_party/pybind11' 2025-12-04T08:59:06.7913435Z Entering 'third_party/python-peachpy' 2025-12-04T08:59:06.7958528Z Entering 'third_party/sleef' 2025-12-04T08:59:06.8002211Z Entering 'third_party/tensorpipe' 2025-12-04T08:59:06.8046011Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:59:06.8096881Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:59:06.8138005Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:59:06.8178783Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:59:06.8219591Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:59:06.8281637Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-12-04T08:59:06.8602912Z Entering 'android/libs/fbjni' 2025-12-04T08:59:06.8645673Z Entering 'third_party/FP16' 2025-12-04T08:59:06.8689708Z Entering 'third_party/FXdiv' 2025-12-04T08:59:06.8737995Z Entering 'third_party/NNPACK' 2025-12-04T08:59:06.8780198Z Entering 'third_party/NVTX' 2025-12-04T08:59:06.8827439Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:59:06.8873038Z Entering 'third_party/XNNPACK' 2025-12-04T08:59:06.8933649Z Entering 'third_party/aiter' 2025-12-04T08:59:06.8977171Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:59:06.9028896Z Entering 'third_party/benchmark' 2025-12-04T08:59:06.9074920Z Entering 'third_party/composable_kernel' 2025-12-04T08:59:06.9125782Z Entering 'third_party/cpp-httplib' 2025-12-04T08:59:06.9172052Z Entering 'third_party/cpuinfo' 2025-12-04T08:59:06.9217960Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:59:06.9261730Z Entering 'third_party/cutlass' 2025-12-04T08:59:06.9317108Z Entering 'third_party/fbgemm' 2025-12-04T08:59:06.9364991Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:59:06.9408554Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:59:06.9462125Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:59:06.9504270Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:59:06.9557055Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:59:06.9598769Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:59:06.9641014Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:59:06.9687254Z Entering 'third_party/flash-attention' 2025-12-04T08:59:06.9733647Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:59:06.9780701Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:59:06.9839721Z Entering 'third_party/flatbuffers' 2025-12-04T08:59:06.9887026Z Entering 'third_party/fmt' 2025-12-04T08:59:06.9935529Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:59:06.9978044Z Entering 'third_party/gloo' 2025-12-04T08:59:07.0019827Z Entering 'third_party/googletest' 2025-12-04T08:59:07.0063439Z Entering 'third_party/ideep' 2025-12-04T08:59:07.0106739Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:59:07.0159887Z Entering 'third_party/ittapi' 2025-12-04T08:59:07.0203647Z Entering 'third_party/kineto' 2025-12-04T08:59:07.0248264Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:59:07.0291303Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:59:07.0338782Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:59:07.0380812Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:59:07.0424005Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:59:07.0467496Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:59:07.0517129Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:59:07.0560835Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:59:07.0603474Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:59:07.0648584Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:59:07.0695967Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:59:07.0738576Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:07.0791836Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:07.0846921Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:59:07.0895623Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:59:07.0938896Z Entering 'third_party/kleidiai' 2025-12-04T08:59:07.0983254Z Entering 'third_party/mimalloc' 2025-12-04T08:59:07.1026215Z Entering 'third_party/nlohmann' 2025-12-04T08:59:07.1073704Z Entering 'third_party/onnx' 2025-12-04T08:59:07.1137217Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:59:07.1190169Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:59:07.1236654Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:59:07.1277391Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:59:07.1318863Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:59:07.1369359Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:59:07.1418609Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:59:07.1461379Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:59:07.1506097Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:59:07.1553629Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:07.1596720Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:07.1660615Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:59:07.1702470Z Entering 'third_party/pocketfft' 2025-12-04T08:59:07.1747388Z Entering 'third_party/protobuf' 2025-12-04T08:59:07.1795294Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:59:07.1837717Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:59:07.1882008Z Entering 'third_party/psimd' 2025-12-04T08:59:07.1925285Z Entering 'third_party/pthreadpool' 2025-12-04T08:59:07.1971012Z Entering 'third_party/pybind11' 2025-12-04T08:59:07.2016620Z Entering 'third_party/python-peachpy' 2025-12-04T08:59:07.2058982Z Entering 'third_party/sleef' 2025-12-04T08:59:07.2103133Z Entering 'third_party/tensorpipe' 2025-12-04T08:59:07.2146111Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:59:07.2196515Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:59:07.2238633Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:59:07.2281431Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:59:07.2322760Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:59:07.2379497Z ##[endgroup] 2025-12-04T08:59:07.2414847Z [command]/usr/bin/git log -1 --format=%H 2025-12-04T08:59:07.2440358Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:59:07.2548076Z ##[group]Run cd "${GITHUB_WORKSPACE}" 2025-12-04T08:59:07.2548487Z cd "${GITHUB_WORKSPACE}" 2025-12-04T08:59:07.2548854Z # Clean stale submodule dirs 2025-12-04T08:59:07.2549311Z if [ -z "${NO_SUDO}" ]; then 2025-12-04T08:59:07.2549747Z  sudo git submodule foreach --recursive git clean -ffdx 2025-12-04T08:59:07.2550173Z else 2025-12-04T08:59:07.2550509Z  git submodule foreach --recursive git clean -ffdx 2025-12-04T08:59:07.2550901Z fi 2025-12-04T08:59:07.2558768Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:07.2559168Z env: 2025-12-04T08:59:07.2559519Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:07.2559801Z NO_SUDO: true 2025-12-04T08:59:07.2560134Z ##[endgroup] 2025-12-04T08:59:07.2909011Z Entering 'android/libs/fbjni' 2025-12-04T08:59:07.2942907Z Entering 'third_party/FP16' 2025-12-04T08:59:07.2979500Z Entering 'third_party/FXdiv' 2025-12-04T08:59:07.3011643Z Entering 'third_party/NNPACK' 2025-12-04T08:59:07.3047627Z Entering 'third_party/NVTX' 2025-12-04T08:59:07.3087887Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:59:07.3121958Z Entering 'third_party/XNNPACK' 2025-12-04T08:59:07.3246709Z Entering 'third_party/aiter' 2025-12-04T08:59:07.3291676Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:59:07.3399132Z Entering 'third_party/benchmark' 2025-12-04T08:59:07.3435274Z Entering 'third_party/composable_kernel' 2025-12-04T08:59:07.3552188Z Entering 'third_party/cpp-httplib' 2025-12-04T08:59:07.3588548Z Entering 'third_party/cpuinfo' 2025-12-04T08:59:07.3624751Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:59:07.3660203Z Entering 'third_party/cutlass' 2025-12-04T08:59:07.3764273Z Entering 'third_party/fbgemm' 2025-12-04T08:59:07.3826025Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:59:07.3859113Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:59:07.3977403Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:59:07.4017862Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:59:07.4115229Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:59:07.4153909Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:59:07.4182694Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:59:07.4228253Z Entering 'third_party/flash-attention' 2025-12-04T08:59:07.4268405Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:59:07.4365027Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:59:07.4452886Z Entering 'third_party/flatbuffers' 2025-12-04T08:59:07.4521362Z Entering 'third_party/fmt' 2025-12-04T08:59:07.4555541Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:59:07.4590961Z Entering 'third_party/gloo' 2025-12-04T08:59:07.4626820Z Entering 'third_party/googletest' 2025-12-04T08:59:07.4662085Z Entering 'third_party/ideep' 2025-12-04T08:59:07.4697575Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:59:07.4781351Z Entering 'third_party/ittapi' 2025-12-04T08:59:07.4822698Z Entering 'third_party/kineto' 2025-12-04T08:59:07.4858722Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:59:07.4904416Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:59:07.4953795Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:59:07.4986499Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:59:07.5019590Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:59:07.5054436Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:59:07.5087540Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:59:07.5119180Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:59:07.5158161Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:59:07.5199305Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:59:07.5236306Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:59:07.5271880Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:07.5318707Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:07.5359447Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:59:07.5392741Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:59:07.5432455Z Entering 'third_party/kleidiai' 2025-12-04T08:59:07.5474138Z Entering 'third_party/mimalloc' 2025-12-04T08:59:07.5514504Z Entering 'third_party/nlohmann' 2025-12-04T08:59:07.5563082Z Entering 'third_party/onnx' 2025-12-04T08:59:07.5860347Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:59:07.5897774Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:59:07.5954488Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:59:07.5986094Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:59:07.6019543Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:59:07.6055590Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:59:07.6100552Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:59:07.6132373Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:59:07.6165230Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:59:07.6196526Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:59:07.6244700Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:59:07.6280559Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:59:07.6518441Z Entering 'third_party/pocketfft' 2025-12-04T08:59:07.6551366Z Entering 'third_party/protobuf' 2025-12-04T08:59:07.6626822Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:59:07.6665713Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:59:07.6701328Z Entering 'third_party/psimd' 2025-12-04T08:59:07.6734922Z Entering 'third_party/pthreadpool' 2025-12-04T08:59:07.6767409Z Entering 'third_party/pybind11' 2025-12-04T08:59:07.6802520Z Entering 'third_party/python-peachpy' 2025-12-04T08:59:07.6836119Z Entering 'third_party/sleef' 2025-12-04T08:59:07.6872833Z Entering 'third_party/tensorpipe' 2025-12-04T08:59:07.6907577Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:59:07.6940617Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:59:07.6973774Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:59:07.7011085Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:59:07.7042402Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:59:07.7211282Z Prepare all required actions 2025-12-04T08:59:07.7211820Z Getting action download info 2025-12-04T08:59:07.8709847Z ##[group]Run ./.github/actions/setup-linux 2025-12-04T08:59:07.8710167Z env: 2025-12-04T08:59:07.8710398Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:07.8710671Z ##[endgroup] 2025-12-04T08:59:07.8750698Z ##[group]Run set -euo pipefail 2025-12-04T08:59:07.8751053Z set -euo pipefail 2025-12-04T08:59:07.8751366Z function get_ec2_metadata() { 2025-12-04T08:59:07.8751763Z  # Pulled from instance metadata endpoint for EC2 2025-12-04T08:59:07.8752416Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2025-12-04T08:59:07.8753026Z  category=$1 2025-12-04T08:59:07.8753405Z  # If it is GCP runner (runner name contains gcp), do not run this 2025-12-04T08:59:07.8753851Z  runner_name_str=i-035b9d8fd6b020edf 2025-12-04T08:59:07.8754250Z  if [[ -f /.inarc ]]; then 2025-12-04T08:59:07.8754602Z  echo "ARC Runner, no info on ec2 metadata" 2025-12-04T08:59:07.8755015Z  elif [[ $runner_name_str == *"gcp"* ]]; then 2025-12-04T08:59:07.8755509Z  echo "Runner is from Google Cloud Platform, No info on ec2 metadata" 2025-12-04T08:59:07.8755964Z  else 2025-12-04T08:59:07.8756858Z  curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2025-12-04T08:59:07.8757822Z  fi 2025-12-04T08:59:07.8758045Z } 2025-12-04T08:59:07.8758470Z echo "ami-id: $(get_ec2_metadata ami-id)" 2025-12-04T08:59:07.8758904Z echo "instance-id: $(get_ec2_metadata instance-id)" 2025-12-04T08:59:07.8759403Z echo "instance-type: $(get_ec2_metadata instance-type)" 2025-12-04T08:59:07.8759835Z echo "system info $(uname -a)" 2025-12-04T08:59:07.8765775Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:07.8766172Z env: 2025-12-04T08:59:07.8766399Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:07.8766659Z ##[endgroup] 2025-12-04T08:59:07.8917615Z ami-id: ami-08982f1c5bf93d976 2025-12-04T08:59:07.9028578Z instance-id: i-035b9d8fd6b020edf 2025-12-04T08:59:07.9139445Z instance-type: g4dn.12xlarge 2025-12-04T08:59:07.9150360Z system info Linux ip-10-1-59-14.ec2.internal 6.1.150-174.273.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Sep 9 12:21:26 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux 2025-12-04T08:59:07.9170496Z ##[group]Run if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi 2025-12-04T08:59:07.9171014Z if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi 2025-12-04T08:59:07.9177380Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:07.9177825Z env: 2025-12-04T08:59:07.9178081Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:07.9178376Z ##[endgroup] 2025-12-04T08:59:09.9893605Z Thu Dec 4 08:59:09 2025 2025-12-04T08:59:09.9894613Z +-----------------------------------------------------------------------------------------+ 2025-12-04T08:59:09.9895367Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T08:59:09.9895983Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T08:59:09.9896718Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T08:59:09.9897595Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T08:59:09.9898143Z | | | MIG M. | 2025-12-04T08:59:09.9898589Z |=========================================+========================+======================| 2025-12-04T08:59:10.0276819Z | 0 Tesla T4 Off | 00000000:00:1B.0 Off | 0 | 2025-12-04T08:59:10.0278325Z | N/A 36C P0 25W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T08:59:10.0278856Z | | | N/A | 2025-12-04T08:59:10.0279353Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T08:59:10.0279895Z | 1 Tesla T4 Off | 00000000:00:1C.0 Off | 0 | 2025-12-04T08:59:10.0280413Z | N/A 35C P0 25W / 70W | 0MiB / 15360MiB | 4% Default | 2025-12-04T08:59:10.0280862Z | | | N/A | 2025-12-04T08:59:10.0281355Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T08:59:10.0281895Z | 2 Tesla T4 Off | 00000000:00:1D.0 Off | 0 | 2025-12-04T08:59:10.0282404Z | N/A 34C P0 25W / 70W | 0MiB / 15360MiB | 4% Default | 2025-12-04T08:59:10.0282868Z | | | N/A | 2025-12-04T08:59:10.0283354Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T08:59:10.0283888Z | 3 Tesla T4 Off | 00000000:00:1E.0 Off | 0 | 2025-12-04T08:59:10.0284390Z | N/A 35C P0 25W / 70W | 0MiB / 15360MiB | 4% Default | 2025-12-04T08:59:10.0284857Z | | | N/A | 2025-12-04T08:59:10.0285340Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T08:59:10.0285833Z 2025-12-04T08:59:10.0286051Z +-----------------------------------------------------------------------------------------+ 2025-12-04T08:59:10.0286568Z | Processes: | 2025-12-04T08:59:10.0287112Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T08:59:10.0287616Z | ID ID Usage | 2025-12-04T08:59:10.0288035Z |=========================================================================================| 2025-12-04T08:59:10.0300443Z | No running processes found | 2025-12-04T08:59:10.0301061Z +-----------------------------------------------------------------------------------------+ 2025-12-04T08:59:11.6945016Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T08:59:11.6946148Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T08:59:11.6953098Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:11.6953519Z env: 2025-12-04T08:59:11.6953746Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:11.6954034Z ##[endgroup] 2025-12-04T08:59:11.7012550Z ##[group]Run if systemctl is-active --quiet docker; then 2025-12-04T08:59:11.7013061Z if systemctl is-active --quiet docker; then 2025-12-04T08:59:11.7013507Z  echo "Docker daemon is running..."; 2025-12-04T08:59:11.7013867Z else 2025-12-04T08:59:11.7014273Z  echo "Starting docker daemon..." && sudo systemctl start docker; 2025-12-04T08:59:11.7014760Z fi 2025-12-04T08:59:11.7021174Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:11.7021640Z env: 2025-12-04T08:59:11.7021897Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:11.7022191Z ##[endgroup] 2025-12-04T08:59:11.7105684Z Docker daemon is running... 2025-12-04T08:59:11.7148768Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T08:59:11.7149208Z with: 2025-12-04T08:59:11.7149416Z shell: bash 2025-12-04T08:59:11.7149788Z timeout_minutes: 5 2025-12-04T08:59:11.7150047Z max_attempts: 3 2025-12-04T08:59:11.7150277Z retry_wait_seconds: 30 2025-12-04T08:59:11.7152683Z command: AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" # For LF Runners we need to make sure we also login to Meta's ECR docker registry too. META_AWS_ACCOUNT_ID=308535385114 if [ "$AWS_ACCOUNT_ID" != "$META_AWS_ACCOUNT_ID" ] ; then aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \ --password-stdin "$META_AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" fi 2025-12-04T08:59:11.7155124Z polling_interval_seconds: 1 2025-12-04T08:59:11.7155421Z warning_on_retry: true 2025-12-04T08:59:11.7155693Z continue_on_error: false 2025-12-04T08:59:11.7155942Z env: 2025-12-04T08:59:11.7156168Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:11.7156444Z AWS_RETRY_MODE: standard 2025-12-04T08:59:11.7156700Z AWS_MAX_ATTEMPTS: 5 2025-12-04T08:59:11.7156964Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T08:59:11.7157250Z ##[endgroup] 2025-12-04T08:59:12.8654343Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T08:59:12.8655075Z Configure a credential helper to remove this warning. See 2025-12-04T08:59:12.8655739Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T08:59:12.8656184Z 2025-12-04T08:59:12.8656427Z Login Succeeded 2025-12-04T08:59:13.3948871Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T08:59:13.3950077Z Configure a credential helper to remove this warning. See 2025-12-04T08:59:13.3950739Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T08:59:13.3951183Z 2025-12-04T08:59:13.3951285Z Login Succeeded 2025-12-04T08:59:13.8082220Z Command completed after 1 attempt(s). 2025-12-04T08:59:13.8135533Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:59:13.8136121Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:59:13.8136899Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:59:13.8145179Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:13.8145581Z env: 2025-12-04T08:59:13.8145801Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:13.8146079Z ##[endgroup] 2025-12-04T08:59:13.8228714Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T08:59:13.8229377Z # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T08:59:13.8229881Z # shellcheck disable=SC2046 2025-12-04T08:59:13.8230275Z docker stop $(docker ps -q) || true 2025-12-04T08:59:13.8230682Z # Prune all of the docker images 2025-12-04T08:59:13.8231064Z docker system prune -af 2025-12-04T08:59:13.8236994Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:13.8237388Z env: 2025-12-04T08:59:13.8237603Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:13.8237876Z ##[endgroup] 2025-12-04T08:59:13.8477610Z "docker stop" requires at least 1 argument. 2025-12-04T08:59:13.8478405Z See 'docker stop --help'. 2025-12-04T08:59:13.8478668Z 2025-12-04T08:59:13.8478893Z Usage: docker stop [OPTIONS] CONTAINER [CONTAINER...] 2025-12-04T08:59:13.8479226Z 2025-12-04T08:59:13.8479356Z Stop one or more running containers 2025-12-04T08:59:13.8629871Z Total reclaimed space: 0B 2025-12-04T08:59:13.8830026Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main 2025-12-04T08:59:13.8830587Z with: 2025-12-04T08:59:13.8831518Z docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:13.8832673Z use-custom-docker-registry: true 2025-12-04T08:59:13.8833025Z docker-build-dir: .ci/docker 2025-12-04T08:59:13.8833355Z docker-build-script: ./build.sh 2025-12-04T08:59:13.8833691Z working-directory: . 2025-12-04T08:59:13.8834086Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:13.8834539Z force-push: false 2025-12-04T08:59:13.8834785Z env: 2025-12-04T08:59:13.8835025Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:13.8835315Z ##[endgroup] 2025-12-04T08:59:13.8855214Z ##[group]Run set -ex 2025-12-04T08:59:13.8855532Z set -ex 2025-12-04T08:59:13.8855781Z  2025-12-04T08:59:13.8856260Z # If the docker build directory or the build script doesn't exist, the action will 2025-12-04T08:59:13.8857339Z # gracefully return the docker image name as it is. Pulling docker image in Linux 2025-12-04T08:59:13.8858014Z # job could then download the pre-built image as usual 2025-12-04T08:59:13.8858832Z if [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then 2025-12-04T08:59:13.8859581Z  echo "skip=false" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:13.8859967Z else 2025-12-04T08:59:13.8860269Z  echo "skip=true" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:13.8860790Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:13.8861252Z  2025-12-04T08:59:13.8861903Z  echo "Not using custom ECR registry. Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..." 2025-12-04T08:59:13.8862665Z  exit 0 2025-12-04T08:59:13.8863062Z fi 2025-12-04T08:59:13.8863305Z  2025-12-04T08:59:13.8863700Z if [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then 2025-12-04T08:59:13.8864410Z  # The docker image name already includes the ECR prefix and tag, so we can just 2025-12-04T08:59:13.8865026Z  # use it as it is, but first let's extract the tag 2025-12-04T08:59:13.8865588Z  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}') 2025-12-04T08:59:13.8866182Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:13.8866740Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:13.8867215Z else 2025-12-04T08:59:13.8867516Z  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then 2025-12-04T08:59:13.8867960Z  CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:} 2025-12-04T08:59:13.8868409Z  DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*} 2025-12-04T08:59:13.8868904Z  fi 2025-12-04T08:59:13.8869516Z  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}") 2025-12-04T08:59:13.8870181Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:13.8871067Z  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:13.8871865Z  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:13.8872347Z fi 2025-12-04T08:59:13.8879042Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:13.8879473Z env: 2025-12-04T08:59:13.8879721Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:13.8880009Z REPO_NAME: pytorch 2025-12-04T08:59:13.8881174Z DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:13.8882164Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T08:59:13.8882590Z DOCKER_BUILD_SCRIPT: ./build.sh 2025-12-04T08:59:13.8882972Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:13.8883391Z USE_CUSTOM_DOCKER_REGISTRY: true 2025-12-04T08:59:13.8883698Z CUSTOM_TAG_PREFIX: 2025-12-04T08:59:13.8883936Z ##[endgroup] 2025-12-04T08:59:13.8907979Z + [[ -d .ci/docker ]] 2025-12-04T08:59:13.8908296Z + [[ -f .ci/docker/./build.sh ]] 2025-12-04T08:59:13.8908647Z + [[ true == \t\r\u\e ]] 2025-12-04T08:59:13.8909056Z + echo skip=false 2025-12-04T08:59:13.8910278Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]] 2025-12-04T08:59:13.8916530Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:13.8917484Z ++ awk -F '[:,]' '{print $2}' 2025-12-04T08:59:13.8940003Z + DOCKER_TAG=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:13.8941055Z + echo docker-tag=pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:13.8942592Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:13.8965450Z ##[group]Run set +e 2025-12-04T08:59:13.8965759Z set +e 2025-12-04T08:59:13.8965983Z set -x 2025-12-04T08:59:13.8966214Z  2025-12-04T08:59:13.8966434Z login() { 2025-12-04T08:59:13.8966916Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T08:59:13.8967462Z } 2025-12-04T08:59:13.8967688Z  2025-12-04T08:59:13.8968007Z retry () { 2025-12-04T08:59:13.8968289Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T08:59:13.8968618Z } 2025-12-04T08:59:13.8968816Z  2025-12-04T08:59:13.8969056Z retry login "${DOCKER_REGISTRY}" 2025-12-04T08:59:13.8969374Z  2025-12-04T08:59:13.8969603Z START_TIME=$(date +%s) 2025-12-04T08:59:13.8969908Z # Wait up to 120 minutes 2025-12-04T08:59:13.8970289Z while [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do 2025-12-04T08:59:13.8970794Z  # Check if image already exists, if it does then skip building it 2025-12-04T08:59:13.8971312Z  if docker manifest inspect "${DOCKER_IMAGE}"; then 2025-12-04T08:59:13.8971691Z  exit 0 2025-12-04T08:59:13.8971934Z  fi 2025-12-04T08:59:13.8972144Z  2025-12-04T08:59:13.8972549Z  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can 2025-12-04T08:59:13.8973254Z  # use this to differentiate between the Docker build and regular build jobs. For the 2025-12-04T08:59:13.8973938Z  # latter, it will wait for the Docker images to become available before continuing 2025-12-04T08:59:13.8974487Z  if [ "${DOCKER_PUSH:-false}" == "true" ]; then 2025-12-04T08:59:13.8974911Z  # It's a Docker build job, let's build the image 2025-12-04T08:59:13.8975279Z  break 2025-12-04T08:59:13.8975511Z  else 2025-12-04T08:59:13.8975870Z  # It's a regular build job, wait for the image to become available 2025-12-04T08:59:13.8976401Z  sleep 300 2025-12-04T08:59:13.8976656Z  fi 2025-12-04T08:59:13.8977070Z done 2025-12-04T08:59:13.8977314Z  2025-12-04T08:59:13.8977724Z # NB: This part requires a full checkout. Otherwise, the merge base will 2025-12-04T08:59:13.8978530Z # be empty. The default action would be to continue rebuild the image 2025-12-04T08:59:13.8979144Z if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then 2025-12-04T08:59:13.8979679Z  # if we're on the base branch then use the parent commit 2025-12-04T08:59:13.8980136Z  MERGE_BASE=$(git rev-parse HEAD~) 2025-12-04T08:59:13.8980503Z else 2025-12-04T08:59:13.8980880Z  # otherwise we're on a PR, so use the most recent base commit 2025-12-04T08:59:13.8981437Z  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") 2025-12-04T08:59:13.8981846Z fi 2025-12-04T08:59:13.8982087Z  2025-12-04T08:59:13.8982351Z if [[ -z "${MERGE_BASE}" ]]; then 2025-12-04T08:59:13.8982755Z  echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:13.8983140Z  2025-12-04T08:59:13.8983684Z  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..." 2025-12-04T08:59:13.8984327Z  exit 0 2025-12-04T08:59:13.8984586Z fi 2025-12-04T08:59:13.8984821Z  2025-12-04T08:59:13.8985172Z if ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then 2025-12-04T08:59:13.8985962Z  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit" 2025-12-04T08:59:13.8986653Z  exit 1 2025-12-04T08:59:13.8986907Z fi 2025-12-04T08:59:13.8987129Z  2025-12-04T08:59:13.8987547Z PREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}") 2025-12-04T08:59:13.8988316Z # If no image exists but the hash is the same as the previous hash then we should error out here 2025-12-04T08:59:13.8989230Z if [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then 2025-12-04T08:59:13.8989929Z  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch" 2025-12-04T08:59:13.8990743Z  echo " Will re-build docker image to store in local cache, TTS may be longer" 2025-12-04T08:59:13.8991280Z fi 2025-12-04T08:59:13.8991483Z  2025-12-04T08:59:13.8991752Z echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T08:59:13.8997189Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:13.8997583Z env: 2025-12-04T08:59:13.8997797Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:13.8998081Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T08:59:13.8998439Z BASE_REVISION: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:59:13.8999385Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:13.9000562Z DOCKER_TAG: pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:13.9001265Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:13.9001673Z DOCKER_PUSH: 2025-12-04T08:59:13.9001901Z ##[endgroup] 2025-12-04T08:59:13.9025713Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:13.9026230Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:13.9028394Z + aws ecr get-login-password --region us-east-1 2025-12-04T08:59:13.9029765Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:14.4339967Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T08:59:14.4340679Z Configure a credential helper to remove this warning. See 2025-12-04T08:59:14.4341335Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T08:59:14.4341805Z 2025-12-04T08:59:14.4342269Z Login Succeeded 2025-12-04T08:59:14.4359812Z ++ date +%s 2025-12-04T08:59:14.4367687Z + START_TIME=1764838754 2025-12-04T08:59:14.4371635Z ++ date +%s 2025-12-04T08:59:14.4381251Z + [[ 1764831554 -lt 1764838754 ]] 2025-12-04T08:59:14.4382359Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:14.6356145Z { 2025-12-04T08:59:14.6356484Z "schemaVersion": 2, 2025-12-04T08:59:14.6356976Z "mediaType": "application/vnd.docker.distribution.manifest.v2+json", 2025-12-04T08:59:14.6357649Z "config": { 2025-12-04T08:59:14.6358038Z "mediaType": "application/vnd.docker.container.image.v1+json", 2025-12-04T08:59:14.6358512Z "size": 34864, 2025-12-04T08:59:14.6358983Z "digest": "sha256:add7313791033822205cdb3cf32096534b2cfaa4855bd48119b59000bfe00301" 2025-12-04T08:59:14.6359520Z }, 2025-12-04T08:59:14.6359744Z "layers": [ 2025-12-04T08:59:14.6360088Z { 2025-12-04T08:59:14.6360447Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6360924Z "size": 30447951, 2025-12-04T08:59:14.6361422Z "digest": "sha256:63e5bc7682b85ae57a1221210f64d62e7a90b0a30f19af4ca734b8242ae49d63" 2025-12-04T08:59:14.6361951Z }, 2025-12-04T08:59:14.6362164Z { 2025-12-04T08:59:14.6362579Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6363048Z "size": 1554, 2025-12-04T08:59:14.6363502Z "digest": "sha256:0678d56345c994444b77bb70b1177189d23e794748b1d75ffc45d227c7dea94a" 2025-12-04T08:59:14.6364021Z }, 2025-12-04T08:59:14.6364233Z { 2025-12-04T08:59:14.6364585Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6365054Z "size": 313275661, 2025-12-04T08:59:14.6365543Z "digest": "sha256:45f5c9ddfce78349dff3d5edfbaa0310ae17311f66abdcd7e00fa21b500e801c" 2025-12-04T08:59:14.6366071Z }, 2025-12-04T08:59:14.6366280Z { 2025-12-04T08:59:14.6366641Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6367094Z "size": 787, 2025-12-04T08:59:14.6367555Z "digest": "sha256:086b1df51ac1162d9c45698e9dfaf91c6c222c8bd9ab01797ac8f9344bc8044f" 2025-12-04T08:59:14.6368092Z }, 2025-12-04T08:59:14.6368527Z { 2025-12-04T08:59:14.6368891Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6369359Z "size": 106, 2025-12-04T08:59:14.6369811Z "digest": "sha256:fe8a7b64bf98352f89057bcba66beef2fb44cc05fbd3606abccd8e86cf476234" 2025-12-04T08:59:14.6370399Z }, 2025-12-04T08:59:14.6370610Z { 2025-12-04T08:59:14.6370971Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6371422Z "size": 703, 2025-12-04T08:59:14.6371874Z "digest": "sha256:7680723e9a578033dd106b45784c639f06cc8adb1f5239ec513d9de01087c1af" 2025-12-04T08:59:14.6372401Z }, 2025-12-04T08:59:14.6372597Z { 2025-12-04T08:59:14.6372957Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6373423Z "size": 1216, 2025-12-04T08:59:14.6373864Z "digest": "sha256:9c5027aeeb4e3101f48c1d2e400c387110e1009e42497ee801f1b4b7f7edb5c0" 2025-12-04T08:59:14.6374389Z }, 2025-12-04T08:59:14.6374608Z { 2025-12-04T08:59:14.6374959Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6375421Z "size": 483, 2025-12-04T08:59:14.6375866Z "digest": "sha256:9a56521103600bd37a1e7c1191b5136c2d738c092f8a6701499f7068a32c2628" 2025-12-04T08:59:14.6376500Z }, 2025-12-04T08:59:14.6376703Z { 2025-12-04T08:59:14.6377238Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6377723Z "size": 110361875, 2025-12-04T08:59:14.6378187Z "digest": "sha256:375c4427e9141269458333b1463fdb219e736fd6231ec1c56c625c48437ace77" 2025-12-04T08:59:14.6378726Z }, 2025-12-04T08:59:14.6378942Z { 2025-12-04T08:59:14.6379301Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6379780Z "size": 4961, 2025-12-04T08:59:14.6380253Z "digest": "sha256:a86faaa7dbdd70e678e5ea20072637ee42618921ca8f80ca089f789325d4b0c2" 2025-12-04T08:59:14.6380785Z }, 2025-12-04T08:59:14.6381005Z { 2025-12-04T08:59:14.6381514Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6381993Z "size": 1755, 2025-12-04T08:59:14.6382464Z "digest": "sha256:fb7848686804957915d98f8655ef6da0fe4c521b50a82aefdebf475983505a15" 2025-12-04T08:59:14.6383013Z }, 2025-12-04T08:59:14.6383231Z { 2025-12-04T08:59:14.6383604Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6384083Z "size": 724, 2025-12-04T08:59:14.6384551Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T08:59:14.6385077Z }, 2025-12-04T08:59:14.6385299Z { 2025-12-04T08:59:14.6385674Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6386141Z "size": 543, 2025-12-04T08:59:14.6386612Z "digest": "sha256:79dc80f426b29d4ae9157b967050b03e66aa0c4b1295b944a1dd70106be87066" 2025-12-04T08:59:14.6387162Z }, 2025-12-04T08:59:14.6387385Z { 2025-12-04T08:59:14.6387753Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6388377Z "size": 3185190117, 2025-12-04T08:59:14.6388872Z "digest": "sha256:a13fcc1b90bb9c251ebe7ef2a03c4cb3afa1c8bdafe84f5f85136773059a3735" 2025-12-04T08:59:14.6389406Z }, 2025-12-04T08:59:14.6389623Z { 2025-12-04T08:59:14.6389993Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6390444Z "size": 32, 2025-12-04T08:59:14.6390904Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:14.6391449Z }, 2025-12-04T08:59:14.6391647Z { 2025-12-04T08:59:14.6392008Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6392473Z "size": 396, 2025-12-04T08:59:14.6392918Z "digest": "sha256:549db4d6c618ecd9534658a233e3c90508f82d8735f965c2786b2eaa078869e5" 2025-12-04T08:59:14.6393428Z }, 2025-12-04T08:59:14.6393636Z { 2025-12-04T08:59:14.6393995Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6394455Z "size": 236860, 2025-12-04T08:59:14.6394999Z "digest": "sha256:5c63528cb580001e65104f4cb0809bf0673a00f989a7db42fd6d86aa1ec27cee" 2025-12-04T08:59:14.6395523Z }, 2025-12-04T08:59:14.6395718Z { 2025-12-04T08:59:14.6396079Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6396642Z + exit 0 2025-12-04T08:59:14.6396855Z "size": 231, 2025-12-04T08:59:14.6397315Z "digest": "sha256:75bd83b989a44e4d4119a3f972891025eb0e9ce95cfbe4a0ca5cdbe7130028d6" 2025-12-04T08:59:14.6397844Z }, 2025-12-04T08:59:14.6398057Z { 2025-12-04T08:59:14.6398405Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6398871Z "size": 3043497, 2025-12-04T08:59:14.6399334Z "digest": "sha256:de6e78970f517178cb91f36cd02bd9ca7b72a08fb82a0f9007516026f258c035" 2025-12-04T08:59:14.6399847Z }, 2025-12-04T08:59:14.6400058Z { 2025-12-04T08:59:14.6400418Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6400875Z "size": 1472, 2025-12-04T08:59:14.6401345Z "digest": "sha256:e13ed7c7e4736e81dc21af755b3363eb26e4d3b2f1ca988dfe65effa47d8fa42" 2025-12-04T08:59:14.6401877Z }, 2025-12-04T08:59:14.6402074Z { 2025-12-04T08:59:14.6402432Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6402898Z "size": 481, 2025-12-04T08:59:14.6403340Z "digest": "sha256:6e2949bcb74152577a0f20c38bcb6dd80f5e68427e3e531a80e08c9ecc73a979" 2025-12-04T08:59:14.6403870Z }, 2025-12-04T08:59:14.6404080Z { 2025-12-04T08:59:14.6404442Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6404895Z "size": 202, 2025-12-04T08:59:14.6405356Z "digest": "sha256:14d69d9aaec70287efd2fd35c4f93e43a29a4098458cc9fca1c93f02ad7356cb" 2025-12-04T08:59:14.6405887Z }, 2025-12-04T08:59:14.6406083Z { 2025-12-04T08:59:14.6406443Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6406905Z "size": 607, 2025-12-04T08:59:14.6407434Z "digest": "sha256:5c02769dd8e5bba2f7f5fd84bde9595fcb3bdbffcae497503fa846f9b5e78bf5" 2025-12-04T08:59:14.6407986Z }, 2025-12-04T08:59:14.6408197Z { 2025-12-04T08:59:14.6408548Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6409017Z "size": 7889619584, 2025-12-04T08:59:14.6409494Z "digest": "sha256:35041ce524ac4afec40ecd73b1393c830614f1f79d43a6439767a6c7d5b7027b" 2025-12-04T08:59:14.6410027Z }, 2025-12-04T08:59:14.6410226Z { 2025-12-04T08:59:14.6410588Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6411053Z "size": 830, 2025-12-04T08:59:14.6411493Z "digest": "sha256:2fa92dc5885e080e049ceb4139288b6c0e39fab34256945708b08ea55a1f7a0b" 2025-12-04T08:59:14.6412014Z }, 2025-12-04T08:59:14.6412227Z { 2025-12-04T08:59:14.6412575Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6413038Z "size": 33451739, 2025-12-04T08:59:14.6413512Z "digest": "sha256:2b85eafbd92a0e70a0a70154ad8bf4584095e576d95873368f30373f5966714a" 2025-12-04T08:59:14.6414029Z }, 2025-12-04T08:59:14.6414241Z { 2025-12-04T08:59:14.6414600Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6415053Z "size": 104, 2025-12-04T08:59:14.6415515Z "digest": "sha256:ff755a4ddad7880f23c6b767d432d6f1eafdb62b3ea18f8a98e22c441c099fcb" 2025-12-04T08:59:14.6416050Z }, 2025-12-04T08:59:14.6416261Z { 2025-12-04T08:59:14.6416868Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6417385Z "size": 1496, 2025-12-04T08:59:14.6417853Z "digest": "sha256:09eb41bdf42d8605b57b2363348154140904dec914b34a67298b82122bfce2b3" 2025-12-04T08:59:14.6418375Z }, 2025-12-04T08:59:14.6418593Z { 2025-12-04T08:59:14.6418964Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6419433Z "size": 458787828, 2025-12-04T08:59:14.6419921Z "digest": "sha256:11ede4d59e935e62f41b33220fe871794ab5e57ce724173b713368977683bcf6" 2025-12-04T08:59:14.6420542Z }, 2025-12-04T08:59:14.6420956Z { 2025-12-04T08:59:14.6421341Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6421823Z "size": 164, 2025-12-04T08:59:14.6422280Z "digest": "sha256:1283cd8f801a142172f3ab76fd472df8583223d9437de3e4d18d8cf98ea3fa98" 2025-12-04T08:59:14.6422831Z }, 2025-12-04T08:59:14.6423052Z { 2025-12-04T08:59:14.6423432Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6423900Z "size": 346, 2025-12-04T08:59:14.6424366Z "digest": "sha256:024fa855425fa524ad4500660cf61d53be62b99556d31b8b280d14caba434a35" 2025-12-04T08:59:14.6424910Z }, 2025-12-04T08:59:14.6425119Z { 2025-12-04T08:59:14.6425494Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6425975Z "size": 32, 2025-12-04T08:59:14.6426435Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:14.6426997Z }, 2025-12-04T08:59:14.6427222Z { 2025-12-04T08:59:14.6427584Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6428067Z "size": 106, 2025-12-04T08:59:14.6428545Z "digest": "sha256:303e6747a62efecf5efa1f97d0e66b40a3b39da8d79a51f75b89f4c92ae7ec52" 2025-12-04T08:59:14.6429103Z }, 2025-12-04T08:59:14.6429303Z { 2025-12-04T08:59:14.6429674Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6430158Z "size": 424, 2025-12-04T08:59:14.6430622Z "digest": "sha256:3017cdf4838bcc9a33daebc07487f8ae1f6bd6e7ce8322c14f5480e8db9ef90e" 2025-12-04T08:59:14.6431173Z }, 2025-12-04T08:59:14.6431388Z { 2025-12-04T08:59:14.6431746Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6432226Z "size": 19309374, 2025-12-04T08:59:14.6432813Z "digest": "sha256:6b6cd1c358e886dc6ed7fd46ac4bcc1a0a73b7b1301739ea1953478ee5d83f50" 2025-12-04T08:59:14.6433318Z }, 2025-12-04T08:59:14.6433650Z { 2025-12-04T08:59:14.6434012Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6434450Z "size": 108, 2025-12-04T08:59:14.6434892Z "digest": "sha256:b2dd045011241d1cf8889e2a7369d9fe4844dfe15529b520ccd6a59bd3c1532e" 2025-12-04T08:59:14.6435402Z }, 2025-12-04T08:59:14.6435605Z { 2025-12-04T08:59:14.6435941Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6436392Z "size": 827, 2025-12-04T08:59:14.6436830Z "digest": "sha256:55adc51fe5897031d4cf2f2b8fd162213f6e46a52848630c616606271b97952e" 2025-12-04T08:59:14.6437334Z }, 2025-12-04T08:59:14.6437543Z { 2025-12-04T08:59:14.6437891Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6459825Z "size": 724, 2025-12-04T08:59:14.6460321Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T08:59:14.6460876Z }, 2025-12-04T08:59:14.6461098Z { 2025-12-04T08:59:14.6461484Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6461978Z "size": 149, 2025-12-04T08:59:14.6462445Z "digest": "sha256:a43ca0e4b837964b12b7469194cfe939c26de027298040028975324dce25938a" 2025-12-04T08:59:14.6462968Z }, 2025-12-04T08:59:14.6463186Z { 2025-12-04T08:59:14.6463562Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6464035Z "size": 138, 2025-12-04T08:59:14.6464505Z "digest": "sha256:b7212f17fd1404837fcfdd086dd0e2667931e4db377d45d8d89a44390c84e11d" 2025-12-04T08:59:14.6465051Z }, 2025-12-04T08:59:14.6465267Z { 2025-12-04T08:59:14.6465625Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6466102Z "size": 141, 2025-12-04T08:59:14.6466566Z "digest": "sha256:083e42cac090e6486c35f392b64ee54448f5e4aa947003aeb3e1f92c8ea5c099" 2025-12-04T08:59:14.6467094Z }, 2025-12-04T08:59:14.6467308Z { 2025-12-04T08:59:14.6467684Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6468339Z "size": 32, 2025-12-04T08:59:14.6468915Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:14.6469547Z }, 2025-12-04T08:59:14.6469737Z { 2025-12-04T08:59:14.6470089Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6470539Z "size": 223, 2025-12-04T08:59:14.6470973Z "digest": "sha256:0a00b784a4aac341795729b254f7edd09e811b7f51d0c58e0e6bfeeee6940503" 2025-12-04T08:59:14.6471489Z }, 2025-12-04T08:59:14.6471694Z { 2025-12-04T08:59:14.6472164Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6472580Z "size": 255, 2025-12-04T08:59:14.6472994Z "digest": "sha256:c6173c779f7ba143a21214ea5f032b141863a37ceb4c0ac01d3248c216ce5241" 2025-12-04T08:59:14.6473475Z }, 2025-12-04T08:59:14.6473656Z { 2025-12-04T08:59:14.6473986Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6474424Z "size": 145520672, 2025-12-04T08:59:14.6474849Z "digest": "sha256:ed3d1e3387b924585c332bf1bc252fa159cd0d25256a874043ff0141b1ab5ff7" 2025-12-04T08:59:14.6475331Z }, 2025-12-04T08:59:14.6475527Z { 2025-12-04T08:59:14.6475846Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6476276Z "size": 106, 2025-12-04T08:59:14.6476685Z "digest": "sha256:b29343478586aeee19d2a622661716f6f1591280c890f49b727a8da13a610784" 2025-12-04T08:59:14.6477160Z }, 2025-12-04T08:59:14.6477345Z { 2025-12-04T08:59:14.6477680Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6478118Z "size": 312293530, 2025-12-04T08:59:14.6478573Z "digest": "sha256:c6f0520487fb506bc4601fd84d5f28d8a76b203e004731e4b2067c2ab1a14e0b" 2025-12-04T08:59:14.6479067Z }, 2025-12-04T08:59:14.6479264Z { 2025-12-04T08:59:14.6479587Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6480022Z "size": 3058011133, 2025-12-04T08:59:14.6480574Z "digest": "sha256:148171691cd4c4d20310d490d4b4dd903490d04ea07fb8f7e668a28768683e9a" 2025-12-04T08:59:14.6481053Z }, 2025-12-04T08:59:14.6481254Z { 2025-12-04T08:59:14.6481592Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6482010Z "size": 129, 2025-12-04T08:59:14.6482440Z "digest": "sha256:2c666d30ed77fff9ff1167d41cd645dad98280fcbe941f5bc3828c7ae66b1287" 2025-12-04T08:59:14.6482941Z }, 2025-12-04T08:59:14.6483144Z { 2025-12-04T08:59:14.6483466Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6483898Z "size": 880, 2025-12-04T08:59:14.6484317Z "digest": "sha256:5d8d3a0a98e012c5068e0f3bae5a03e3148ecf2d063634eee4c9241a1e3fdfb5" 2025-12-04T08:59:14.6484794Z }, 2025-12-04T08:59:14.6484990Z { 2025-12-04T08:59:14.6485323Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6485737Z "size": 724, 2025-12-04T08:59:14.6486149Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T08:59:14.6486778Z }, 2025-12-04T08:59:14.6486959Z { 2025-12-04T08:59:14.6487288Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6487712Z "size": 139, 2025-12-04T08:59:14.6488125Z "digest": "sha256:b06bafce9e817295d8127207747c80aa18e04392ff0875844fc30a1e794a8a0c" 2025-12-04T08:59:14.6488597Z }, 2025-12-04T08:59:14.6488794Z { 2025-12-04T08:59:14.6489122Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6489535Z "size": 32, 2025-12-04T08:59:14.6489954Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:14.6490441Z }, 2025-12-04T08:59:14.6490621Z { 2025-12-04T08:59:14.6490948Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6491369Z "size": 159, 2025-12-04T08:59:14.6491781Z "digest": "sha256:15e0d7e4590d3d8f598d05aec3a92f891bf8b4605bcc38cc2de852b6014ef8f3" 2025-12-04T08:59:14.6492335Z }, 2025-12-04T08:59:14.6492528Z { 2025-12-04T08:59:14.6492850Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6493282Z "size": 1011, 2025-12-04T08:59:14.6493709Z "digest": "sha256:a514bd1add3164d8d7ca99aa19294c4ed8b97b074635d98714c4f598a959f4cd" 2025-12-04T08:59:14.6494199Z }, 2025-12-04T08:59:14.6494380Z { 2025-12-04T08:59:14.6494710Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6495136Z "size": 724, 2025-12-04T08:59:14.6495532Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T08:59:14.6496007Z }, 2025-12-04T08:59:14.6496198Z { 2025-12-04T08:59:14.6496619Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6497259Z "size": 134, 2025-12-04T08:59:14.6497762Z "digest": "sha256:57b84ee6000204f27a1d9bca199b19be4c86ecd324540dbdf239c56a6c3b34ea" 2025-12-04T08:59:14.6498299Z }, 2025-12-04T08:59:14.6498517Z { 2025-12-04T08:59:14.6498894Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6499364Z "size": 32, 2025-12-04T08:59:14.6499828Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:14.6500373Z }, 2025-12-04T08:59:14.6500588Z { 2025-12-04T08:59:14.6500946Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6501425Z "size": 157, 2025-12-04T08:59:14.6501909Z "digest": "sha256:b8babeff6d817a5961dddc15c6bdfdbd05da187fae75d5804015f99fd7c066d8" 2025-12-04T08:59:14.6502454Z }, 2025-12-04T08:59:14.6502669Z { 2025-12-04T08:59:14.6503038Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6503502Z "size": 602, 2025-12-04T08:59:14.6503970Z "digest": "sha256:83779ddf6a85ab387f64a45f274cba245b69e4fd1931ff0b5d7d3efd4b7a43bc" 2025-12-04T08:59:14.6504515Z }, 2025-12-04T08:59:14.6504716Z { 2025-12-04T08:59:14.6505166Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6505651Z "size": 724, 2025-12-04T08:59:14.6506095Z "digest": "sha256:3541df015cdb7e8925273399d28e56c31b3c9196f00439ac2925537b173b1f84" 2025-12-04T08:59:14.6506629Z }, 2025-12-04T08:59:14.6506846Z { 2025-12-04T08:59:14.6507218Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6507687Z "size": 155, 2025-12-04T08:59:14.6508158Z "digest": "sha256:8b7620c0d736cc79381207ce5afe2af90f0cd7f0cd394577d2c9520d7f74762f" 2025-12-04T08:59:14.6508703Z }, 2025-12-04T08:59:14.6508904Z { 2025-12-04T08:59:14.6509348Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6509774Z "size": 32, 2025-12-04T08:59:14.6510178Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:14.6510665Z }, 2025-12-04T08:59:14.6510856Z { 2025-12-04T08:59:14.6511178Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6511610Z "size": 188, 2025-12-04T08:59:14.6512032Z "digest": "sha256:3bcfa090e4efd3677425f76baea9f1e0c50a75d8c6b5713ec05310f1dff24539" 2025-12-04T08:59:14.6512523Z }, 2025-12-04T08:59:14.6512704Z { 2025-12-04T08:59:14.6513034Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6513461Z "size": 1370, 2025-12-04T08:59:14.6513873Z "digest": "sha256:eb0504ec4d9218a79896b604f73dc0ea5a0f96266ad9c2cdbbbe5f0f18222694" 2025-12-04T08:59:14.6514361Z }, 2025-12-04T08:59:14.6514547Z { 2025-12-04T08:59:14.6514872Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6515288Z "size": 32, 2025-12-04T08:59:14.6515688Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:14.6516168Z }, 2025-12-04T08:59:14.6516348Z { 2025-12-04T08:59:14.6516657Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6517073Z "size": 136, 2025-12-04T08:59:14.6517546Z "digest": "sha256:15d0fec09d7b196a1462d51516ee90fc3443ba178d3e56d59cacf32146b4321d" 2025-12-04T08:59:14.6518013Z }, 2025-12-04T08:59:14.6518193Z { 2025-12-04T08:59:14.6518511Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6518919Z "size": 528, 2025-12-04T08:59:14.6519331Z "digest": "sha256:cca81fcc62a949959ca4dd3c9056fb293d548ef8607127eeeef6cfd3a8897ca8" 2025-12-04T08:59:14.6519810Z }, 2025-12-04T08:59:14.6519991Z { 2025-12-04T08:59:14.6520302Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6520714Z "size": 32, 2025-12-04T08:59:14.6521465Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:14.6522043Z }, 2025-12-04T08:59:14.6522245Z { 2025-12-04T08:59:14.6522605Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6523066Z "size": 104, 2025-12-04T08:59:14.6523539Z "digest": "sha256:b0b8f9b5c6ab98db9cd830dc584e1b6aec9add139e4cc48d8c243d36691e25b4" 2025-12-04T08:59:14.6524085Z }, 2025-12-04T08:59:14.6524280Z { 2025-12-04T08:59:14.6524634Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6525091Z "size": 435, 2025-12-04T08:59:14.6525528Z "digest": "sha256:0606ca4d47a8a70e91e92b03ca51a85e731641b09342136a54ef2f2a6d9dfb44" 2025-12-04T08:59:14.6526044Z }, 2025-12-04T08:59:14.6526241Z { 2025-12-04T08:59:14.6526589Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6527041Z "size": 32, 2025-12-04T08:59:14.6527490Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:14.6528021Z }, 2025-12-04T08:59:14.6528209Z { 2025-12-04T08:59:14.6528557Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6529017Z "size": 109, 2025-12-04T08:59:14.6529579Z "digest": "sha256:2f80a4e1b3b95ed67bb781ea787e8a63e46de79117d9d8e65c257072b38afa2d" 2025-12-04T08:59:14.6530118Z }, 2025-12-04T08:59:14.6530321Z { 2025-12-04T08:59:14.6530668Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6531134Z "size": 1896, 2025-12-04T08:59:14.6531588Z "digest": "sha256:35c916fb1bd057e517dcab78c3a2a018e68096d8993892ad84f47562d37ae352" 2025-12-04T08:59:14.6532118Z }, 2025-12-04T08:59:14.6532311Z { 2025-12-04T08:59:14.6532665Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6533130Z "size": 197526165, 2025-12-04T08:59:14.6533781Z "digest": "sha256:195537b7dafc96192f768323b1a8cc2a914d41959849b73198579576b0872a44" 2025-12-04T08:59:14.6534249Z }, 2025-12-04T08:59:14.6534430Z { 2025-12-04T08:59:14.6534736Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6535150Z "size": 106, 2025-12-04T08:59:14.6535555Z "digest": "sha256:dc454fd3967e5735b2498b7f1d958a2c626987d5e4ce225ca98da3cd945b59f3" 2025-12-04T08:59:14.6536019Z }, 2025-12-04T08:59:14.6536199Z { 2025-12-04T08:59:14.6536595Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6537211Z "size": 165, 2025-12-04T08:59:14.6537748Z "digest": "sha256:701b34f115fa897181c046dc37288e87cbc3ad74c36a9e2224b5bfe7c5703afb" 2025-12-04T08:59:14.6538284Z }, 2025-12-04T08:59:14.6538493Z { 2025-12-04T08:59:14.6538844Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6539307Z "size": 7944, 2025-12-04T08:59:14.6539765Z "digest": "sha256:39cefc00ffedebc9098261c798408b87a20c95a88fccb110594077f48dadf760" 2025-12-04T08:59:14.6540285Z }, 2025-12-04T08:59:14.6540480Z { 2025-12-04T08:59:14.6540838Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6541292Z "size": 8071, 2025-12-04T08:59:14.6541751Z "digest": "sha256:6ae51eb61a325b2c2995a5088c81aa20821b75be65b5aa722c7c40556b5d03ea" 2025-12-04T08:59:14.6542284Z }, 2025-12-04T08:59:14.6542480Z { 2025-12-04T08:59:14.6542932Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6543403Z "size": 304, 2025-12-04T08:59:14.6543850Z "digest": "sha256:1fd5341e66dfc0c1ae23af014641a92a6fd02640c528fe6d4dc55921ed659a26" 2025-12-04T08:59:14.6544380Z }, 2025-12-04T08:59:14.6544587Z { 2025-12-04T08:59:14.6544941Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6545400Z "size": 13364291, 2025-12-04T08:59:14.6545874Z "digest": "sha256:72a7c87e35e40ab796f90aee1b51add7902f0cdc44406d2505b6c6a1f55a8da6" 2025-12-04T08:59:14.6546406Z }, 2025-12-04T08:59:14.6546596Z { 2025-12-04T08:59:14.6546947Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6547402Z "size": 108, 2025-12-04T08:59:14.6547853Z "digest": "sha256:ec36862ac98ebaac52ee1a8b1d162d45bd0e3bf59ae7e19c8f80ad3960b4c600" 2025-12-04T08:59:14.6548388Z }, 2025-12-04T08:59:14.6548695Z { 2025-12-04T08:59:14.6549030Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6549557Z "size": 54145699, 2025-12-04T08:59:14.6549980Z "digest": "sha256:05ddbf246e8add0e293474dbf88bb028d5a295a25ac59e8648a18db644377773" 2025-12-04T08:59:14.6550451Z }, 2025-12-04T08:59:14.6550619Z { 2025-12-04T08:59:14.6550929Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:59:14.6551335Z "size": 32, 2025-12-04T08:59:14.6551728Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:59:14.6552199Z } 2025-12-04T08:59:14.6552374Z ] 2025-12-04T08:59:14.6552543Z } 2025-12-04T08:59:14.6579052Z ##[group]Run set -eux 2025-12-04T08:59:14.6579369Z set -eux 2025-12-04T08:59:14.6579848Z # It's ok if this steps fails, it would then be an anonymous user like what we used to have 2025-12-04T08:59:14.6581300Z aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true 2025-12-04T08:59:14.6588695Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:14.6589238Z env: 2025-12-04T08:59:14.6589451Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:14.6589722Z ##[endgroup] 2025-12-04T08:59:14.6617757Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token 2025-12-04T08:59:14.6618368Z + jq --raw-output .SecretString 2025-12-04T08:59:14.6619288Z + jq -r .docker_hub_readonly_token 2025-12-04T08:59:14.6620283Z + docker login --username pytorchbot --password-stdin 2025-12-04T08:59:15.2324117Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T08:59:15.2324852Z Configure a credential helper to remove this warning. See 2025-12-04T08:59:15.2325533Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T08:59:15.2325997Z 2025-12-04T08:59:15.2326145Z Login Succeeded 2025-12-04T08:59:15.2415382Z ##[group]Run tag=${ECR_DOCKER_IMAGE##*:} 2025-12-04T08:59:15.2415789Z tag=${ECR_DOCKER_IMAGE##*:} 2025-12-04T08:59:15.2416232Z echo "docker pull ghcr.io/pytorch/ci-image:${tag/:/-}" 2025-12-04T08:59:15.2423641Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:15.2424080Z env: 2025-12-04T08:59:15.2424333Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:15.2425315Z ECR_DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:15.2426309Z ##[endgroup] 2025-12-04T08:59:15.2453778Z docker pull ghcr.io/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:15.2502853Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2025-12-04T08:59:15.2503315Z with: 2025-12-04T08:59:15.2504111Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:15.2505246Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:15.2505655Z env: 2025-12-04T08:59:15.2505866Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:15.2506143Z ##[endgroup] 2025-12-04T08:59:15.2520503Z ##[group]Run set -x 2025-12-04T08:59:15.2520965Z set -x 2025-12-04T08:59:15.2521364Z set +e 2025-12-04T08:59:15.2521623Z  2025-12-04T08:59:15.2521872Z login() { 2025-12-04T08:59:15.2522419Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T08:59:15.2523035Z } 2025-12-04T08:59:15.2523278Z  2025-12-04T08:59:15.2523558Z retry () { 2025-12-04T08:59:15.2523864Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T08:59:15.2524234Z } 2025-12-04T08:59:15.2524479Z  2025-12-04T08:59:15.2524759Z retry login "${DOCKER_REGISTRY}" 2025-12-04T08:59:15.2525106Z  2025-12-04T08:59:15.2525674Z IMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024') 2025-12-04T08:59:15.2526454Z echo "Compressed size of image in MB: ${IMAGE_SIZE}" 2025-12-04T08:59:15.2526877Z  2025-12-04T08:59:15.2527113Z set -e 2025-12-04T08:59:15.2527600Z # ignore output since only exit code is used for conditional 2025-12-04T08:59:15.2528119Z # only pull docker image if it's not available locally 2025-12-04T08:59:15.2528676Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2025-12-04T08:59:15.2529209Z  retry docker pull "${DOCKER_IMAGE}" 2025-12-04T08:59:15.2529543Z fi 2025-12-04T08:59:15.2534980Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:59:15.2535380Z env: 2025-12-04T08:59:15.2535609Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:59:15.2536579Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:15.2537848Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:15.2538304Z ##[endgroup] 2025-12-04T08:59:15.2563481Z + set +e 2025-12-04T08:59:15.2564025Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:15.2564514Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:15.2567917Z + aws ecr get-login-password --region us-east-1 2025-12-04T08:59:15.2568845Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:59:15.7876858Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2025-12-04T08:59:15.7877557Z Configure a credential helper to remove this warning. See 2025-12-04T08:59:15.7878220Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2025-12-04T08:59:15.7878702Z 2025-12-04T08:59:15.7878820Z Login Succeeded 2025-12-04T08:59:15.7900336Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:15.7901456Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024' 2025-12-04T08:59:15.9818499Z + IMAGE_SIZE=15091.581844329834 2025-12-04T08:59:15.9818961Z + echo 'Compressed size of image in MB: 15091.581844329834' 2025-12-04T08:59:15.9819384Z + set -e 2025-12-04T08:59:15.9819691Z Compressed size of image in MB: 15091.581844329834 2025-12-04T08:59:15.9821306Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:15.9939521Z + retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:15.9941415Z + docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:59:16.2148805Z pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a: Pulling from pytorch/ci-image 2025-12-04T08:59:16.2151637Z 63e5bc7682b8: Pulling fs layer 2025-12-04T08:59:16.2151967Z 0678d56345c9: Pulling fs layer 2025-12-04T08:59:16.2152316Z 45f5c9ddfce7: Pulling fs layer 2025-12-04T08:59:16.2152622Z 086b1df51ac1: Pulling fs layer 2025-12-04T08:59:16.2152938Z fe8a7b64bf98: Pulling fs layer 2025-12-04T08:59:16.2153252Z 7680723e9a57: Pulling fs layer 2025-12-04T08:59:16.2153568Z 9c5027aeeb4e: Pulling fs layer 2025-12-04T08:59:16.2153879Z 9a5652110360: Pulling fs layer 2025-12-04T08:59:16.2154185Z 375c4427e914: Pulling fs layer 2025-12-04T08:59:16.2154487Z a86faaa7dbdd: Pulling fs layer 2025-12-04T08:59:16.2154809Z fb7848686804: Pulling fs layer 2025-12-04T08:59:16.2155135Z 3541df015cdb: Pulling fs layer 2025-12-04T08:59:16.2155436Z 79dc80f426b2: Pulling fs layer 2025-12-04T08:59:16.2155752Z a13fcc1b90bb: Pulling fs layer 2025-12-04T08:59:16.2156071Z 4f4fb700ef54: Pulling fs layer 2025-12-04T08:59:16.2156371Z 549db4d6c618: Pulling fs layer 2025-12-04T08:59:16.2156685Z 5c63528cb580: Pulling fs layer 2025-12-04T08:59:16.2156995Z 75bd83b989a4: Pulling fs layer 2025-12-04T08:59:16.2157308Z de6e78970f51: Pulling fs layer 2025-12-04T08:59:16.2157606Z e13ed7c7e473: Pulling fs layer 2025-12-04T08:59:16.2157920Z 6e2949bcb741: Pulling fs layer 2025-12-04T08:59:16.2158417Z 14d69d9aaec7: Pulling fs layer 2025-12-04T08:59:16.2158726Z 5c02769dd8e5: Pulling fs layer 2025-12-04T08:59:16.2159051Z 35041ce524ac: Pulling fs layer 2025-12-04T08:59:16.2159364Z fe8a7b64bf98: Waiting 2025-12-04T08:59:16.2159643Z 2fa92dc5885e: Pulling fs layer 2025-12-04T08:59:16.2159967Z 2b85eafbd92a: Pulling fs layer 2025-12-04T08:59:16.2160279Z 086b1df51ac1: Waiting 2025-12-04T08:59:16.2160565Z ff755a4ddad7: Pulling fs layer 2025-12-04T08:59:16.2160870Z 7680723e9a57: Waiting 2025-12-04T08:59:16.2161173Z 09eb41bdf42d: Pulling fs layer 2025-12-04T08:59:16.2161487Z 9c5027aeeb4e: Waiting 2025-12-04T08:59:16.2161748Z 9a5652110360: Waiting 2025-12-04T08:59:16.2162019Z 3541df015cdb: Waiting 2025-12-04T08:59:16.2162350Z 11ede4d59e93: Pulling fs layer 2025-12-04T08:59:16.2162666Z 375c4427e914: Waiting 2025-12-04T08:59:16.2162968Z 1283cd8f801a: Pulling fs layer 2025-12-04T08:59:16.2163295Z 024fa855425f: Pulling fs layer 2025-12-04T08:59:16.2163619Z 303e6747a62e: Pulling fs layer 2025-12-04T08:59:16.2164035Z e13ed7c7e473: Waiting 2025-12-04T08:59:16.2164307Z 79dc80f426b2: Waiting 2025-12-04T08:59:16.2164580Z a86faaa7dbdd: Waiting 2025-12-04T08:59:16.2164857Z 3017cdf4838b: Pulling fs layer 2025-12-04T08:59:16.2165163Z fb7848686804: Waiting 2025-12-04T08:59:16.2165443Z 6b6cd1c358e8: Pulling fs layer 2025-12-04T08:59:16.2165739Z 6e2949bcb741: Waiting 2025-12-04T08:59:16.2166021Z b2dd04501124: Pulling fs layer 2025-12-04T08:59:16.2166333Z 14d69d9aaec7: Waiting 2025-12-04T08:59:16.2166615Z 55adc51fe589: Pulling fs layer 2025-12-04T08:59:16.2166903Z de6e78970f51: Waiting 2025-12-04T08:59:16.2167167Z 75bd83b989a4: Waiting 2025-12-04T08:59:16.2167428Z 35041ce524ac: Waiting 2025-12-04T08:59:16.2167679Z a13fcc1b90bb: Waiting 2025-12-04T08:59:16.2167941Z 5c63528cb580: Waiting 2025-12-04T08:59:16.2168215Z a43ca0e4b837: Pulling fs layer 2025-12-04T08:59:16.2168501Z 4f4fb700ef54: Waiting 2025-12-04T08:59:16.2168761Z 5c02769dd8e5: Waiting 2025-12-04T08:59:16.2169024Z 2fa92dc5885e: Waiting 2025-12-04T08:59:16.2169285Z b7212f17fd14: Pulling fs layer 2025-12-04T08:59:16.2169585Z 2b85eafbd92a: Waiting 2025-12-04T08:59:16.2169852Z 1283cd8f801a: Waiting 2025-12-04T08:59:16.2170336Z 09eb41bdf42d: Waiting 2025-12-04T08:59:16.2170624Z 083e42cac090: Pulling fs layer 2025-12-04T08:59:16.2170928Z 11ede4d59e93: Waiting 2025-12-04T08:59:16.2171182Z ff755a4ddad7: Waiting 2025-12-04T08:59:16.2171451Z 024fa855425f: Waiting 2025-12-04T08:59:16.2171824Z b2dd04501124: Waiting 2025-12-04T08:59:16.2172087Z 0a00b784a4aa: Pulling fs layer 2025-12-04T08:59:16.2172421Z 549db4d6c618: Waiting 2025-12-04T08:59:16.2172682Z a43ca0e4b837: Waiting 2025-12-04T08:59:16.2172928Z 3017cdf4838b: Waiting 2025-12-04T08:59:16.2173191Z 55adc51fe589: Waiting 2025-12-04T08:59:16.2173465Z c6173c779f7b: Pulling fs layer 2025-12-04T08:59:16.2173766Z 6b6cd1c358e8: Waiting 2025-12-04T08:59:16.2174014Z 303e6747a62e: Waiting 2025-12-04T08:59:16.2174272Z b7212f17fd14: Waiting 2025-12-04T08:59:16.2174534Z 0a00b784a4aa: Waiting 2025-12-04T08:59:16.2174799Z ed3d1e3387b9: Pulling fs layer 2025-12-04T08:59:16.2175102Z 083e42cac090: Waiting 2025-12-04T08:59:16.2175374Z b29343478586: Pulling fs layer 2025-12-04T08:59:16.2175666Z ed3d1e3387b9: Waiting 2025-12-04T08:59:16.2175943Z c6f0520487fb: Pulling fs layer 2025-12-04T08:59:16.2176253Z 148171691cd4: Pulling fs layer 2025-12-04T08:59:16.2176674Z 2c666d30ed77: Pulling fs layer 2025-12-04T08:59:16.2177180Z 5d8d3a0a98e0: Pulling fs layer 2025-12-04T08:59:16.2177516Z b06bafce9e81: Pulling fs layer 2025-12-04T08:59:16.2177828Z 15e0d7e4590d: Pulling fs layer 2025-12-04T08:59:16.2178140Z 2c666d30ed77: Waiting 2025-12-04T08:59:16.2178425Z a514bd1add31: Pulling fs layer 2025-12-04T08:59:16.2178723Z 5d8d3a0a98e0: Waiting 2025-12-04T08:59:16.2179008Z 57b84ee60002: Pulling fs layer 2025-12-04T08:59:16.2179307Z b29343478586: Waiting 2025-12-04T08:59:16.2179573Z 15e0d7e4590d: Waiting 2025-12-04T08:59:16.2179824Z 148171691cd4: Waiting 2025-12-04T08:59:16.2180095Z a514bd1add31: Waiting 2025-12-04T08:59:16.2180385Z b8babeff6d81: Pulling fs layer 2025-12-04T08:59:16.2180697Z 83779ddf6a85: Pulling fs layer 2025-12-04T08:59:16.2181002Z 57b84ee60002: Waiting 2025-12-04T08:59:16.2181274Z b06bafce9e81: Waiting 2025-12-04T08:59:16.2181545Z b8babeff6d81: Waiting 2025-12-04T08:59:16.2181828Z 8b7620c0d736: Pulling fs layer 2025-12-04T08:59:16.2182154Z 3bcfa090e4ef: Pulling fs layer 2025-12-04T08:59:16.2182464Z eb0504ec4d92: Pulling fs layer 2025-12-04T08:59:16.2182774Z 83779ddf6a85: Waiting 2025-12-04T08:59:16.2183043Z 8b7620c0d736: Waiting 2025-12-04T08:59:16.2183297Z eb0504ec4d92: Waiting 2025-12-04T08:59:16.2183582Z 15d0fec09d7b: Pulling fs layer 2025-12-04T08:59:16.2183909Z cca81fcc62a9: Pulling fs layer 2025-12-04T08:59:16.2184226Z b0b8f9b5c6ab: Pulling fs layer 2025-12-04T08:59:16.2184548Z 0606ca4d47a8: Pulling fs layer 2025-12-04T08:59:16.2184864Z 15d0fec09d7b: Waiting 2025-12-04T08:59:16.2185126Z 0606ca4d47a8: Waiting 2025-12-04T08:59:16.2185402Z cca81fcc62a9: Waiting 2025-12-04T08:59:16.2185692Z 2f80a4e1b3b9: Pulling fs layer 2025-12-04T08:59:16.2186015Z 35c916fb1bd0: Pulling fs layer 2025-12-04T08:59:16.2186324Z 195537b7dafc: Pulling fs layer 2025-12-04T08:59:16.2186633Z 2f80a4e1b3b9: Waiting 2025-12-04T08:59:16.2186922Z dc454fd3967e: Pulling fs layer 2025-12-04T08:59:16.2187219Z 35c916fb1bd0: Waiting 2025-12-04T08:59:16.2187490Z 195537b7dafc: Waiting 2025-12-04T08:59:16.2187771Z 701b34f115fa: Pulling fs layer 2025-12-04T08:59:16.2188086Z 39cefc00ffed: Pulling fs layer 2025-12-04T08:59:16.2188526Z 6ae51eb61a32: Pulling fs layer 2025-12-04T08:59:16.2188826Z dc454fd3967e: Waiting 2025-12-04T08:59:16.2189076Z 701b34f115fa: Waiting 2025-12-04T08:59:16.2189339Z 39cefc00ffed: Waiting 2025-12-04T08:59:16.2189607Z 6ae51eb61a32: Waiting 2025-12-04T08:59:16.2189874Z 1fd5341e66df: Pulling fs layer 2025-12-04T08:59:16.2190191Z 72a7c87e35e4: Pulling fs layer 2025-12-04T08:59:16.2190509Z ec36862ac98e: Pulling fs layer 2025-12-04T08:59:16.2190805Z 1fd5341e66df: Waiting 2025-12-04T08:59:16.2191072Z 72a7c87e35e4: Waiting 2025-12-04T08:59:16.2191354Z 05ddbf246e8a: Pulling fs layer 2025-12-04T08:59:16.2191648Z ec36862ac98e: Waiting 2025-12-04T08:59:16.2191919Z 05ddbf246e8a: Waiting 2025-12-04T08:59:16.3138845Z 0678d56345c9: Download complete 2025-12-04T08:59:16.3857980Z 086b1df51ac1: Download complete 2025-12-04T08:59:16.4618803Z fe8a7b64bf98: Verifying Checksum 2025-12-04T08:59:16.4619200Z fe8a7b64bf98: Download complete 2025-12-04T08:59:16.5371005Z 7680723e9a57: Verifying Checksum 2025-12-04T08:59:16.5371668Z 7680723e9a57: Download complete 2025-12-04T08:59:16.5660555Z 63e5bc7682b8: Download complete 2025-12-04T08:59:16.6267646Z 9c5027aeeb4e: Download complete 2025-12-04T08:59:16.6376534Z 9a5652110360: Verifying Checksum 2025-12-04T08:59:16.6377159Z 9a5652110360: Download complete 2025-12-04T08:59:16.7122229Z a86faaa7dbdd: Verifying Checksum 2025-12-04T08:59:16.7122886Z a86faaa7dbdd: Download complete 2025-12-04T08:59:16.8124696Z fb7848686804: Download complete 2025-12-04T08:59:16.8977642Z 3541df015cdb: Verifying Checksum 2025-12-04T08:59:16.8978071Z 3541df015cdb: Download complete 2025-12-04T08:59:16.9539326Z 79dc80f426b2: Download complete 2025-12-04T08:59:17.3488608Z 63e5bc7682b8: Pull complete 2025-12-04T08:59:17.3613598Z 0678d56345c9: Pull complete 2025-12-04T08:59:17.7931346Z 375c4427e914: Download complete 2025-12-04T08:59:17.8016962Z 4f4fb700ef54: Verifying Checksum 2025-12-04T08:59:17.8017396Z 4f4fb700ef54: Download complete 2025-12-04T08:59:17.8702434Z 549db4d6c618: Verifying Checksum 2025-12-04T08:59:17.8702836Z 549db4d6c618: Download complete 2025-12-04T08:59:17.9506311Z 5c63528cb580: Download complete 2025-12-04T08:59:18.0292544Z 75bd83b989a4: Download complete 2025-12-04T08:59:18.1051889Z de6e78970f51: Verifying Checksum 2025-12-04T08:59:18.1052363Z de6e78970f51: Download complete 2025-12-04T08:59:18.1772628Z e13ed7c7e473: Download complete 2025-12-04T08:59:18.2447385Z 6e2949bcb741: Verifying Checksum 2025-12-04T08:59:18.2448027Z 6e2949bcb741: Download complete 2025-12-04T08:59:18.3044726Z 14d69d9aaec7: Verifying Checksum 2025-12-04T08:59:18.3045222Z 14d69d9aaec7: Download complete 2025-12-04T08:59:18.3772037Z 5c02769dd8e5: Verifying Checksum 2025-12-04T08:59:18.3772451Z 5c02769dd8e5: Download complete 2025-12-04T08:59:19.4058909Z 45f5c9ddfce7: Verifying Checksum 2025-12-04T08:59:19.4059338Z 45f5c9ddfce7: Download complete 2025-12-04T08:59:19.4865590Z 2fa92dc5885e: Verifying Checksum 2025-12-04T08:59:19.4865992Z 2fa92dc5885e: Download complete 2025-12-04T08:59:19.8783672Z 2b85eafbd92a: Verifying Checksum 2025-12-04T08:59:19.8784217Z 2b85eafbd92a: Download complete 2025-12-04T08:59:19.9653918Z ff755a4ddad7: Verifying Checksum 2025-12-04T08:59:19.9654368Z ff755a4ddad7: Download complete 2025-12-04T08:59:20.0504264Z 09eb41bdf42d: Download complete 2025-12-04T08:59:24.7059571Z 11ede4d59e93: Verifying Checksum 2025-12-04T08:59:24.7060003Z 11ede4d59e93: Download complete 2025-12-04T08:59:24.7667258Z 1283cd8f801a: Download complete 2025-12-04T08:59:24.8426365Z 024fa855425f: Verifying Checksum 2025-12-04T08:59:24.8427046Z 024fa855425f: Download complete 2025-12-04T08:59:24.9067080Z 303e6747a62e: Verifying Checksum 2025-12-04T08:59:24.9067537Z 303e6747a62e: Download complete 2025-12-04T08:59:24.9922956Z 3017cdf4838b: Verifying Checksum 2025-12-04T08:59:24.9923350Z 3017cdf4838b: Download complete 2025-12-04T08:59:25.2376914Z 6b6cd1c358e8: Verifying Checksum 2025-12-04T08:59:25.2377373Z 6b6cd1c358e8: Download complete 2025-12-04T08:59:25.3012118Z b2dd04501124: Verifying Checksum 2025-12-04T08:59:25.3012593Z b2dd04501124: Download complete 2025-12-04T08:59:25.3938960Z 55adc51fe589: Verifying Checksum 2025-12-04T08:59:25.3939423Z 55adc51fe589: Download complete 2025-12-04T08:59:25.4922165Z a43ca0e4b837: Verifying Checksum 2025-12-04T08:59:25.4922584Z a43ca0e4b837: Download complete 2025-12-04T08:59:25.6031118Z b7212f17fd14: Verifying Checksum 2025-12-04T08:59:25.6031542Z b7212f17fd14: Download complete 2025-12-04T08:59:25.6727061Z 083e42cac090: Verifying Checksum 2025-12-04T08:59:25.6727453Z 083e42cac090: Download complete 2025-12-04T08:59:25.7600765Z 0a00b784a4aa: Download complete 2025-12-04T08:59:25.8454634Z c6173c779f7b: Verifying Checksum 2025-12-04T08:59:25.8455044Z c6173c779f7b: Download complete 2025-12-04T08:59:26.4646880Z 45f5c9ddfce7: Pull complete 2025-12-04T08:59:26.4866474Z 086b1df51ac1: Pull complete 2025-12-04T08:59:26.5078507Z fe8a7b64bf98: Pull complete 2025-12-04T08:59:26.5330514Z 7680723e9a57: Pull complete 2025-12-04T08:59:26.5701933Z 9c5027aeeb4e: Pull complete 2025-12-04T08:59:26.5957046Z 9a5652110360: Pull complete 2025-12-04T08:59:27.3496441Z ed3d1e3387b9: Verifying Checksum 2025-12-04T08:59:27.3497311Z ed3d1e3387b9: Download complete 2025-12-04T08:59:27.4208432Z b29343478586: Verifying Checksum 2025-12-04T08:59:27.4209080Z b29343478586: Download complete 2025-12-04T08:59:29.0262934Z 375c4427e914: Pull complete 2025-12-04T08:59:29.3886945Z a86faaa7dbdd: Pull complete 2025-12-04T08:59:29.8285588Z fb7848686804: Pull complete 2025-12-04T08:59:30.3064029Z 3541df015cdb: Pull complete 2025-12-04T08:59:30.5964481Z c6f0520487fb: Verifying Checksum 2025-12-04T08:59:30.5964894Z c6f0520487fb: Download complete 2025-12-04T08:59:30.7830707Z 79dc80f426b2: Pull complete 2025-12-04T08:59:48.8463210Z a13fcc1b90bb: Verifying Checksum 2025-12-04T08:59:48.8463662Z a13fcc1b90bb: Download complete 2025-12-04T08:59:48.9452747Z 2c666d30ed77: Verifying Checksum 2025-12-04T08:59:48.9453679Z 2c666d30ed77: Download complete 2025-12-04T08:59:49.0229660Z 5d8d3a0a98e0: Verifying Checksum 2025-12-04T08:59:49.0230106Z 5d8d3a0a98e0: Download complete 2025-12-04T08:59:49.1138399Z b06bafce9e81: Download complete 2025-12-04T08:59:49.1899603Z 15e0d7e4590d: Verifying Checksum 2025-12-04T08:59:49.1900021Z 15e0d7e4590d: Download complete 2025-12-04T08:59:49.2511547Z a514bd1add31: Download complete 2025-12-04T08:59:49.3353521Z 57b84ee60002: Download complete 2025-12-04T08:59:49.4183582Z b8babeff6d81: Verifying Checksum 2025-12-04T08:59:49.4184020Z b8babeff6d81: Download complete 2025-12-04T08:59:49.4917158Z 83779ddf6a85: Download complete 2025-12-04T08:59:49.5880223Z 8b7620c0d736: Verifying Checksum 2025-12-04T08:59:49.5880808Z 8b7620c0d736: Download complete 2025-12-04T08:59:49.6724927Z 3bcfa090e4ef: Verifying Checksum 2025-12-04T08:59:49.6725372Z 3bcfa090e4ef: Download complete 2025-12-04T08:59:49.7589238Z eb0504ec4d92: Verifying Checksum 2025-12-04T08:59:49.7589657Z eb0504ec4d92: Download complete 2025-12-04T08:59:49.8295572Z 15d0fec09d7b: Verifying Checksum 2025-12-04T08:59:49.8295995Z 15d0fec09d7b: Download complete 2025-12-04T08:59:49.8955698Z cca81fcc62a9: Verifying Checksum 2025-12-04T08:59:49.8956144Z cca81fcc62a9: Download complete 2025-12-04T08:59:49.9851067Z b0b8f9b5c6ab: Verifying Checksum 2025-12-04T08:59:49.9851474Z b0b8f9b5c6ab: Download complete 2025-12-04T08:59:50.0694573Z 0606ca4d47a8: Verifying Checksum 2025-12-04T08:59:50.0695031Z 0606ca4d47a8: Download complete 2025-12-04T08:59:50.1420617Z 2f80a4e1b3b9: Verifying Checksum 2025-12-04T08:59:50.1421197Z 2f80a4e1b3b9: Download complete 2025-12-04T08:59:50.2341657Z 35c916fb1bd0: Download complete 2025-12-04T08:59:52.2494136Z 195537b7dafc: Verifying Checksum 2025-12-04T08:59:52.2494977Z 195537b7dafc: Download complete 2025-12-04T08:59:52.3365558Z dc454fd3967e: Verifying Checksum 2025-12-04T08:59:52.3366046Z dc454fd3967e: Download complete 2025-12-04T08:59:52.4102503Z 701b34f115fa: Verifying Checksum 2025-12-04T08:59:52.4102919Z 701b34f115fa: Download complete 2025-12-04T08:59:52.4943408Z 39cefc00ffed: Download complete 2025-12-04T08:59:52.5624520Z 6ae51eb61a32: Verifying Checksum 2025-12-04T08:59:52.5625070Z 6ae51eb61a32: Download complete 2025-12-04T08:59:52.6210759Z 1fd5341e66df: Verifying Checksum 2025-12-04T08:59:52.6211342Z 1fd5341e66df: Download complete 2025-12-04T08:59:52.8405146Z 72a7c87e35e4: Verifying Checksum 2025-12-04T08:59:52.8405568Z 72a7c87e35e4: Download complete 2025-12-04T08:59:52.9443187Z ec36862ac98e: Verifying Checksum 2025-12-04T08:59:52.9443673Z ec36862ac98e: Download complete 2025-12-04T08:59:53.5355418Z 05ddbf246e8a: Verifying Checksum 2025-12-04T08:59:53.5356111Z 05ddbf246e8a: Download complete 2025-12-04T09:00:01.2258856Z 148171691cd4: Verifying Checksum 2025-12-04T09:00:01.2259278Z 148171691cd4: Download complete 2025-12-04T09:00:34.1473581Z a13fcc1b90bb: Pull complete 2025-12-04T09:00:34.3923260Z 4f4fb700ef54: Pull complete 2025-12-04T09:00:34.8376112Z 549db4d6c618: Pull complete 2025-12-04T09:00:35.5248241Z 5c63528cb580: Pull complete 2025-12-04T09:00:36.0107039Z 75bd83b989a4: Pull complete 2025-12-04T09:00:36.4167606Z de6e78970f51: Pull complete 2025-12-04T09:00:36.5665881Z e13ed7c7e473: Pull complete 2025-12-04T09:00:36.6906633Z 6e2949bcb741: Pull complete 2025-12-04T09:00:37.0520490Z 14d69d9aaec7: Pull complete 2025-12-04T09:00:37.3267714Z 35041ce524ac: Verifying Checksum 2025-12-04T09:00:37.3268623Z 35041ce524ac: Download complete 2025-12-04T09:00:37.5019481Z 5c02769dd8e5: Pull complete 2025-12-04T09:01:49.8274613Z 35041ce524ac: Pull complete 2025-12-04T09:01:50.2368669Z 2fa92dc5885e: Pull complete 2025-12-04T09:01:51.2370662Z 2b85eafbd92a: Pull complete 2025-12-04T09:01:51.6898264Z ff755a4ddad7: Pull complete 2025-12-04T09:01:52.1964615Z 09eb41bdf42d: Pull complete 2025-12-04T09:02:00.1003649Z 11ede4d59e93: Pull complete 2025-12-04T09:02:00.5211534Z 1283cd8f801a: Pull complete 2025-12-04T09:02:00.8810693Z 024fa855425f: Pull complete 2025-12-04T09:02:01.3505304Z 303e6747a62e: Pull complete 2025-12-04T09:02:01.5887823Z 3017cdf4838b: Pull complete 2025-12-04T09:02:01.9022683Z 6b6cd1c358e8: Pull complete 2025-12-04T09:02:01.9250236Z b2dd04501124: Pull complete 2025-12-04T09:02:01.9487715Z 55adc51fe589: Pull complete 2025-12-04T09:02:01.9942336Z a43ca0e4b837: Pull complete 2025-12-04T09:02:02.0183851Z b7212f17fd14: Pull complete 2025-12-04T09:02:02.0420552Z 083e42cac090: Pull complete 2025-12-04T09:02:02.0874458Z 0a00b784a4aa: Pull complete 2025-12-04T09:02:02.1125578Z c6173c779f7b: Pull complete 2025-12-04T09:02:05.0297442Z ed3d1e3387b9: Pull complete 2025-12-04T09:02:05.0546716Z b29343478586: Pull complete 2025-12-04T09:02:06.4151579Z c6f0520487fb: Pull complete 2025-12-04T09:02:57.7145882Z 148171691cd4: Pull complete 2025-12-04T09:02:58.1499512Z 2c666d30ed77: Pull complete 2025-12-04T09:02:58.6080512Z 5d8d3a0a98e0: Pull complete 2025-12-04T09:02:59.5830139Z b06bafce9e81: Pull complete 2025-12-04T09:03:00.3438614Z 15e0d7e4590d: Pull complete 2025-12-04T09:03:00.7960552Z a514bd1add31: Pull complete 2025-12-04T09:03:01.8162153Z 57b84ee60002: Pull complete 2025-12-04T09:03:02.3426793Z b8babeff6d81: Pull complete 2025-12-04T09:03:02.6121350Z 83779ddf6a85: Pull complete 2025-12-04T09:03:03.2005802Z 8b7620c0d736: Pull complete 2025-12-04T09:03:03.9295717Z 3bcfa090e4ef: Pull complete 2025-12-04T09:03:04.3193691Z eb0504ec4d92: Pull complete 2025-12-04T09:03:05.0452515Z 15d0fec09d7b: Pull complete 2025-12-04T09:03:05.4951991Z cca81fcc62a9: Pull complete 2025-12-04T09:03:06.2995168Z b0b8f9b5c6ab: Pull complete 2025-12-04T09:03:06.6299287Z 0606ca4d47a8: Pull complete 2025-12-04T09:03:07.5572240Z 2f80a4e1b3b9: Pull complete 2025-12-04T09:03:08.0015692Z 35c916fb1bd0: Pull complete 2025-12-04T09:03:14.0175108Z 195537b7dafc: Pull complete 2025-12-04T09:03:14.4663851Z dc454fd3967e: Pull complete 2025-12-04T09:03:14.9101341Z 701b34f115fa: Pull complete 2025-12-04T09:03:15.3272328Z 39cefc00ffed: Pull complete 2025-12-04T09:03:15.7303497Z 6ae51eb61a32: Pull complete 2025-12-04T09:03:16.0952087Z 1fd5341e66df: Pull complete 2025-12-04T09:03:17.6688435Z 72a7c87e35e4: Pull complete 2025-12-04T09:03:18.0652405Z ec36862ac98e: Pull complete 2025-12-04T09:03:19.7814392Z 05ddbf246e8a: Pull complete 2025-12-04T09:03:20.4566128Z Digest: sha256:ba21003510dba4bdeed83df81a56fa468e0ee1b612a9445ae1f402a280804f97 2025-12-04T09:03:20.5138800Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:03:20.5405571Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:03:20.5479228Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:03:20.5480334Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:03:20.5489453Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:03:20.5490013Z env: 2025-12-04T09:03:20.5490410Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:20.5490693Z ##[endgroup] 2025-12-04T09:03:20.5680280Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main 2025-12-04T09:03:20.5680751Z with: 2025-12-04T09:03:20.5680994Z driver-version: 580.82.07 2025-12-04T09:03:20.5681270Z env: 2025-12-04T09:03:20.5681504Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:20.5681793Z ##[endgroup] 2025-12-04T09:03:20.5834705Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:03:20.5835748Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T09:03:20.5842183Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:03:20.5842608Z env: 2025-12-04T09:03:20.5842852Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:20.5843135Z ##[endgroup] 2025-12-04T09:03:20.5977960Z ##[group]Run set -euo pipefail 2025-12-04T09:03:20.5978351Z set -euo pipefail 2025-12-04T09:03:20.5978697Z  2025-12-04T09:03:20.5978935Z has_gpu=false 2025-12-04T09:03:20.5979230Z devices="" 2025-12-04T09:03:20.5979502Z  2025-12-04T09:03:20.5979813Z if command -v nvidia-smi >/dev/null 2>&1; then 2025-12-04T09:03:20.5980346Z  if nvidia-smi -L >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:03:20.5980807Z  has_gpu=true 2025-12-04T09:03:20.5981158Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:03:20.5981524Z  fi 2025-12-04T09:03:20.5981774Z fi 2025-12-04T09:03:20.5982029Z  2025-12-04T09:03:20.5982296Z if [ "$has_gpu" = false ]; then 2025-12-04T09:03:20.5982757Z  if ls /dev/nvidia* >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:03:20.5983199Z  has_gpu=true 2025-12-04T09:03:20.5983546Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:03:20.5983927Z  fi 2025-12-04T09:03:20.5984161Z fi 2025-12-04T09:03:20.5984409Z  2025-12-04T09:03:20.5984772Z if [ "$has_gpu" = false ] && command -v lspci >/dev/null 2>&1; then 2025-12-04T09:03:20.5985366Z  if lspci | grep -i 'nvidia' >/tmp/nvidia_devices 2>/dev/null; then 2025-12-04T09:03:20.5985859Z  has_gpu=true 2025-12-04T09:03:20.5986205Z  devices=$(cat /tmp/nvidia_devices) 2025-12-04T09:03:20.5986576Z  fi 2025-12-04T09:03:20.5986812Z fi 2025-12-04T09:03:20.5987059Z  2025-12-04T09:03:20.5987412Z printf 'HAS_NVIDIA=%s\n' "$has_gpu" >> "$GITHUB_OUTPUT" 2025-12-04T09:03:20.5988037Z printf 'DETECTED_DEVICES<> "$GITHUB_OUTPUT" 2025-12-04T09:03:20.5993984Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:03:20.5994402Z env: 2025-12-04T09:03:20.5994625Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:20.5994909Z ##[endgroup] 2025-12-04T09:03:23.7217568Z ##[group]Run if [ "${HAS_NVIDIA}" = "true" ]; then 2025-12-04T09:03:23.7218052Z if [ "${HAS_NVIDIA}" = "true" ]; then 2025-12-04T09:03:23.7218499Z  echo "HAS_NVIDIA_GPU=true" >> "${GITHUB_ENV}" 2025-12-04T09:03:23.7219133Z  echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}" 2025-12-04T09:03:23.7219681Z else 2025-12-04T09:03:23.7219997Z  echo "HAS_NVIDIA_GPU=false" >> "${GITHUB_ENV}" 2025-12-04T09:03:23.7220402Z fi 2025-12-04T09:03:23.7227510Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:03:23.7227952Z env: 2025-12-04T09:03:23.7228201Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:23.7228496Z HAS_NVIDIA: true 2025-12-04T09:03:23.7228763Z ##[endgroup] 2025-12-04T09:03:23.7317796Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482 2025-12-04T09:03:23.7318231Z with: 2025-12-04T09:03:23.7318440Z timeout_minutes: 10 2025-12-04T09:03:23.7318870Z max_attempts: 3 2025-12-04T09:03:23.7351536Z command: # Is it disgusting to have a full shell script here in this github action? Sure # But is it the best way to make it so that this action relies on nothing else? Absolutely set -eou pipefail DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run" install_nvidia_docker2_amzn2() { ( set -x # Needed for yum-config-manager sudo yum install -y yum-utils if [[ "${DISTRIBUTION}" == "amzn2023" ]] ; then YUM_REPO_URL="https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo" else # Amazon Linux 2 YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo" fi sudo yum-config-manager --add-repo "${YUM_REPO_URL}" sudo yum install -y \ nvidia-container-toolkit-1.17.8 \ libnvidia-container-tools-1.17.8 \ libnvidia-container1-1.17.8 \ nvidia-container-toolkit-base-1.17.8 sudo systemctl restart docker ) } install_nvidia_docker2_ubuntu20() { ( set -x # Install nvidia-driver package if not installed status="$(dpkg-query -W --showformat='${db:Status-Status}' nvidia-docker2 2>&1)" if [ ! $? = 0 ] || [ ! "$status" = installed ]; then sudo apt-get install -y nvidia-container-toolkit-1.17.8 sudo systemctl restart docker fi ) } pre_install_nvidia_driver_amzn2() { ( # Purge any nvidia driver installed from RHEL repo sudo yum remove -y nvidia-driver-latest-dkms ) } install_nvidia_driver_common() { ( # Try to gather more information about the runner and its existing NVIDIA driver if any echo "Before installing NVIDIA driver" lspci lsmod modinfo nvidia || true HAS_NVIDIA_DRIVER=0 # Check if NVIDIA driver has already been installed if [ -x "$(command -v nvidia-smi)" ]; then set +e # The driver exists, check its version next. Also check only the first GPU if there are more than one of them # so that the same driver version is not print over multiple lines INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing" elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing" # Turn off persistent mode so that the installation script can unload the kernel module sudo killall nvidia-persistenced || true else HAS_NVIDIA_DRIVER=1 echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation" fi set -e fi if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then # CAUTION: this may need to be updated in future if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then sudo yum groupinstall -y "Development Tools" # ensure our kernel install is the same as our underlying kernel, # groupinstall "Development Tools" has a habit of mismatching kernel headers sudo yum install -y "kernel-devel-uname-r == $(uname -r)" sudo modprobe backlight fi sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN" set +e sudo /bin/bash /tmp/nvidia_driver -s --no-drm NVIDIA_INSTALLATION_STATUS=$? RESET_GPU=0 if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then sudo cat /var/log/nvidia-installer.log # Fail to install NVIDIA driver, try to reset the GPU RESET_GPU=1 elif [ -x "$(command -v nvidia-smi)" ]; then # Check again if nvidia-smi works even if the driver installation completes successfully INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0) NVIDIA_SMI_STATUS=$? if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then RESET_GPU=1 fi fi if [ "$RESET_GPU" -eq 1 ]; then NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1) # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388 for PCI_ID in $NVIDIA_DEVICES; do DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable) echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)" # This requires sudo permission of course echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset sleep 1 done fi sudo rm -fv /tmp/nvidia_driver set -e fi ) } post_install_nvidia_driver_common() { ( sudo modprobe nvidia || true echo "After installing NVIDIA driver" lspci lsmod modinfo nvidia || true ( set +e nvidia-smi # NB: Annoyingly, nvidia-smi command returns successfully with return code 0 even in # the case where the driver has already crashed as it still can get the driver version # and some basic information like the bus ID. However, the rest of the information # would be missing (ERR!), for example: # # +-----------------------------------------------------------------------------+ # | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | # |-------------------------------+----------------------+----------------------+ # | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | # | | | MIG M. | # |===============================+======================+======================| # | 0 ERR! Off | 00000000:00:1E.0 Off | ERR! | # |ERR! ERR! ERR! ERR! / ERR! | 4184MiB / 23028MiB | ERR! Default | # | | | ERR! | # +-------------------------------+----------------------+----------------------+ # # +-----------------------------------------------------------------------------+ # | Processes: | # | GPU GI CI PID Type Process name GPU Memory | # | ID ID Usage | # |=============================================================================| # +-----------------------------------------------------------------------------+ # # This should be reported as a failure instead as it will guarantee to fail when # Docker tries to run with --gpus all # # So, the correct check here is to query one of the missing piece of info like # GPU name, so that the command can fail accordingly nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 NVIDIA_SMI_STATUS=$? # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285 if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}" else echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}" exit ${NVIDIA_SMI_STATUS} fi set -e ) ) } install_nvidia_driver_amzn2() { ( set -x pre_install_nvidia_driver_amzn2 install_nvidia_driver_common post_install_nvidia_driver_common ) } install_nvidia_driver_ubuntu20() { ( set -x install_nvidia_driver_common post_install_nvidia_driver_common ) } echo "== Installing nvidia driver ${DRIVER_FN} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_driver_amzn2 ;; ubuntu20.04) install_nvidia_driver_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Install container toolkit based on distribution echo "== Installing nvidia container toolkit for ${DISTRIBUTION} ==" case "${DISTRIBUTION}" in amzn*) install_nvidia_docker2_amzn2 ;; ubuntu20.04) install_nvidia_docker2_ubuntu20 ;; *) echo "ERROR: Unknown distribution ${DISTRIBUTION}" exit 1 ;; esac # Fix https://github.com/NVIDIA/nvidia-docker/issues/1648 on runners with # more than one GPUs. This just needs to be run once. The command fails # on subsequent runs and complains that the mode is already on, but that's # ok sudo nvidia-persistenced || true # This should show persistence mode ON nvidia-smi # check if the container-toolkit is correctly installed and CUDA is available inside a container docker run --rm -t --gpus=all public.ecr.aws/docker/library/python:3.13 nvidia-smi 2025-12-04T09:03:23.7382951Z retry_wait_seconds: 10 2025-12-04T09:03:23.7383276Z polling_interval_seconds: 1 2025-12-04T09:03:23.7383596Z warning_on_retry: true 2025-12-04T09:03:23.7383905Z continue_on_error: false 2025-12-04T09:03:23.7384200Z env: 2025-12-04T09:03:23.7384428Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:03:23.7384731Z HAS_NVIDIA_GPU: true 2025-12-04T09:03:23.7385090Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:03:23.7385505Z DRIVER_VERSION: 580.82.07 2025-12-04T09:03:23.7385806Z ##[endgroup] 2025-12-04T09:03:23.8741382Z == Installing nvidia driver NVIDIA-Linux-x86_64-580.82.07.run == 2025-12-04T09:03:23.8742447Z + pre_install_nvidia_driver_amzn2 2025-12-04T09:03:23.8743614Z + sudo yum remove -y nvidia-driver-latest-dkms 2025-12-04T09:03:24.4956840Z No match for argument: nvidia-driver-latest-dkms 2025-12-04T09:03:24.4957334Z No packages marked for removal. 2025-12-04T09:03:24.5022882Z Dependencies resolved. 2025-12-04T09:03:24.5032794Z Nothing to do. 2025-12-04T09:03:24.5033587Z Complete! 2025-12-04T09:03:24.6212828Z + install_nvidia_driver_common 2025-12-04T09:03:24.6213939Z + echo 'Before installing NVIDIA driver' 2025-12-04T09:03:24.6215172Z + lspci 2025-12-04T09:03:24.6217225Z Before installing NVIDIA driver 2025-12-04T09:03:24.7722598Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-12-04T09:03:24.7723256Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-12-04T09:03:24.7723966Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-12-04T09:03:24.7724632Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2025-12-04T09:03:24.7725227Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller 2025-12-04T09:03:24.7725899Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-12-04T09:03:24.7726526Z 00:1b.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:24.7727111Z 00:1c.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:24.7727675Z 00:1d.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:24.7728645Z 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:24.7729282Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2025-12-04T09:03:24.7729777Z + lsmod 2025-12-04T09:03:24.7760006Z Module Size Used by 2025-12-04T09:03:24.7760474Z nvidia_uvm 1925120 0 2025-12-04T09:03:24.7760842Z nvidia 14286848 1 nvidia_uvm 2025-12-04T09:03:24.7761179Z drm 602112 1 nvidia 2025-12-04T09:03:24.7761547Z drm_panel_orientation_quirks 32768 1 drm 2025-12-04T09:03:24.7761922Z backlight 24576 1 drm 2025-12-04T09:03:24.7762250Z i2c_core 110592 2 nvidia,drm 2025-12-04T09:03:24.7762599Z xt_conntrack 16384 1 2025-12-04T09:03:24.7762911Z nft_chain_nat 16384 3 2025-12-04T09:03:24.7763206Z xt_MASQUERADE 20480 1 2025-12-04T09:03:24.7763561Z nf_nat 57344 2 nft_chain_nat,xt_MASQUERADE 2025-12-04T09:03:24.7763983Z nf_conntrack_netlink 57344 0 2025-12-04T09:03:24.7764492Z nf_conntrack 184320 4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE 2025-12-04T09:03:24.7765016Z nf_defrag_ipv6 24576 1 nf_conntrack 2025-12-04T09:03:24.7765400Z nf_defrag_ipv4 16384 1 nf_conntrack 2025-12-04T09:03:24.7765758Z xfrm_user 57344 1 2025-12-04T09:03:24.7766062Z xfrm_algo 16384 1 xfrm_user 2025-12-04T09:03:24.7766409Z xt_addrtype 16384 2 2025-12-04T09:03:24.7766718Z nft_compat 20480 4 2025-12-04T09:03:24.7767081Z nf_tables 311296 57 nft_compat,nft_chain_nat 2025-12-04T09:03:24.7767571Z nfnetlink 20480 4 nft_compat,nf_conntrack_netlink,nf_tables 2025-12-04T09:03:24.7768027Z br_netfilter 36864 0 2025-12-04T09:03:24.7768358Z bridge 323584 1 br_netfilter 2025-12-04T09:03:24.7768699Z stp 16384 1 bridge 2025-12-04T09:03:24.7769045Z llc 16384 2 bridge,stp 2025-12-04T09:03:24.7769390Z overlay 167936 0 2025-12-04T09:03:24.7769681Z tls 139264 0 2025-12-04T09:03:24.7769981Z nls_ascii 16384 1 2025-12-04T09:03:24.7770284Z nls_cp437 20480 1 2025-12-04T09:03:24.7770570Z vfat 24576 1 2025-12-04T09:03:24.7770868Z fat 86016 1 vfat 2025-12-04T09:03:24.7771187Z sunrpc 700416 1 2025-12-04T09:03:24.7771483Z i8042 45056 0 2025-12-04T09:03:24.7771768Z skx_edac_common 28672 0 2025-12-04T09:03:24.7772066Z ena 184320 0 2025-12-04T09:03:24.7772366Z serio 28672 3 i8042 2025-12-04T09:03:24.7772685Z ghash_clmulni_intel 16384 0 2025-12-04T09:03:24.7772993Z button 24576 0 2025-12-04T09:03:24.7773295Z sch_fq_codel 20480 33 2025-12-04T09:03:24.7773588Z dm_mod 188416 0 2025-12-04T09:03:24.7773881Z fuse 184320 1 2025-12-04T09:03:24.7774179Z loop 36864 0 2025-12-04T09:03:24.7774469Z configfs 57344 1 2025-12-04T09:03:24.7774770Z dmi_sysfs 20480 0 2025-12-04T09:03:24.7775070Z crc32_pclmul 16384 0 2025-12-04T09:03:24.7775359Z crc32c_intel 24576 0 2025-12-04T09:03:24.7775660Z efivarfs 24576 1 2025-12-04T09:03:24.7775955Z + modinfo nvidia 2025-12-04T09:03:24.7776623Z filename: /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko 2025-12-04T09:03:24.7777388Z import_ns: DMA_BUF 2025-12-04T09:03:24.7777681Z alias: char-major-195-* 2025-12-04T09:03:24.7778013Z version: 580.82.07 2025-12-04T09:03:24.7778318Z supported: external 2025-12-04T09:03:24.7778614Z license: Dual MIT/GPL 2025-12-04T09:03:24.7778966Z firmware: nvidia/580.82.07/gsp_tu10x.bin 2025-12-04T09:03:24.7779386Z firmware: nvidia/580.82.07/gsp_ga10x.bin 2025-12-04T09:03:24.7779774Z srcversion: BA7240A71DCF7DC6FE88C1D 2025-12-04T09:03:24.7781751Z alias: of:N*T*Cnvidia,tegra264-displayC* 2025-12-04T09:03:24.7782193Z alias: of:N*T*Cnvidia,tegra264-display 2025-12-04T09:03:24.7782725Z alias: of:N*T*Cnvidia,tegra234-displayC* 2025-12-04T09:03:24.7783146Z alias: of:N*T*Cnvidia,tegra234-display 2025-12-04T09:03:24.7783562Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2025-12-04T09:03:24.7783973Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2025-12-04T09:03:24.7784366Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2025-12-04T09:03:24.7784742Z depends: i2c-core,drm 2025-12-04T09:03:24.7785046Z retpoline: Y 2025-12-04T09:03:24.7785291Z name: nvidia 2025-12-04T09:03:24.7785726Z vermagic: 6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 2025-12-04T09:03:24.7786299Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2025-12-04T09:03:24.7786830Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2025-12-04T09:03:24.7787344Z parm: NVreg_ResmanDebugLevel:int 2025-12-04T09:03:24.7787716Z parm: NVreg_RmLogonRC:int 2025-12-04T09:03:24.7788078Z parm: NVreg_ModifyDeviceFiles:int 2025-12-04T09:03:24.7788445Z parm: NVreg_DeviceFileUID:int 2025-12-04T09:03:24.7788807Z parm: NVreg_DeviceFileGID:int 2025-12-04T09:03:24.7789175Z parm: NVreg_DeviceFileMode:int 2025-12-04T09:03:24.7789591Z parm: NVreg_InitializeSystemMemoryAllocations:int 2025-12-04T09:03:24.7790054Z parm: NVreg_UsePageAttributeTable:int 2025-12-04T09:03:24.7790458Z parm: NVreg_EnablePCIeGen3:int 2025-12-04T09:03:24.7790806Z parm: NVreg_EnableMSI:int 2025-12-04T09:03:24.7791171Z parm: NVreg_EnableStreamMemOPs:int 2025-12-04T09:03:24.7791601Z parm: NVreg_RestrictProfilingToAdminUsers:int 2025-12-04T09:03:24.7792059Z parm: NVreg_PreserveVideoMemoryAllocations:int 2025-12-04T09:03:24.7792718Z parm: NVreg_EnableS0ixPowerManagement:int 2025-12-04T09:03:24.7793175Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2025-12-04T09:03:24.7793803Z parm: NVreg_DynamicPowerManagement:int 2025-12-04T09:03:24.7794276Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2025-12-04T09:03:24.7794916Z parm: NVreg_EnableGpuFirmware:int 2025-12-04T09:03:24.7795319Z parm: NVreg_EnableGpuFirmwareLogs:int 2025-12-04T09:03:24.7795747Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2025-12-04T09:03:24.7796192Z parm: NVreg_EnableUserNUMAManagement:int 2025-12-04T09:03:24.7796605Z parm: NVreg_MemoryPoolSize:int 2025-12-04T09:03:24.7796992Z parm: NVreg_KMallocHeapMaxSize:int 2025-12-04T09:03:24.7797375Z parm: NVreg_VMallocHeapMaxSize:int 2025-12-04T09:03:24.7797763Z parm: NVreg_IgnoreMMIOCheck:int 2025-12-04T09:03:24.7798134Z parm: NVreg_NvLinkDisable:int 2025-12-04T09:03:24.7798535Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2025-12-04T09:03:24.7798973Z parm: NVreg_RegisterPCIDriver:int 2025-12-04T09:03:24.7799493Z parm: NVreg_RegisterPlatformDeviceDriver:int 2025-12-04T09:03:24.7799898Z parm: NVreg_EnableResizableBar:int 2025-12-04T09:03:24.7800394Z parm: NVreg_EnableDbgBreakpoint:int 2025-12-04T09:03:24.7800769Z parm: NVreg_EnableNonblockingOpen:int 2025-12-04T09:03:24.7801149Z parm: NVreg_CoherentGPUMemoryMode:charp 2025-12-04T09:03:24.7801524Z parm: NVreg_RegistryDwords:charp 2025-12-04T09:03:24.7801899Z parm: NVreg_RegistryDwordsPerDevice:charp 2025-12-04T09:03:24.7802264Z parm: NVreg_RmMsg:charp 2025-12-04T09:03:24.7802568Z parm: NVreg_GpuBlacklist:charp 2025-12-04T09:03:24.7802926Z parm: NVreg_TemporaryFilePath:charp 2025-12-04T09:03:24.7803285Z parm: NVreg_ExcludedGpus:charp 2025-12-04T09:03:24.7803620Z parm: NVreg_DmaRemapPeerMmio:int 2025-12-04T09:03:24.7804058Z parm: NVreg_RmNvlinkBandwidth:charp 2025-12-04T09:03:24.7804517Z parm: NVreg_RmNvlinkBandwidthLinkCount:int 2025-12-04T09:03:24.7804889Z parm: NVreg_ImexChannelCount:int 2025-12-04T09:03:24.7805244Z parm: NVreg_CreateImexChannel0:int 2025-12-04T09:03:24.7805623Z parm: NVreg_GrdmaPciTopoCheckOverride:int 2025-12-04T09:03:24.7805997Z parm: rm_firmware_active:charp 2025-12-04T09:03:24.7806303Z + HAS_NVIDIA_DRIVER=0 2025-12-04T09:03:24.7806574Z ++ command -v nvidia-smi 2025-12-04T09:03:24.7806855Z + '[' -x /usr/bin/nvidia-smi ']' 2025-12-04T09:03:24.7807123Z + set +e 2025-12-04T09:03:24.7807460Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0 2025-12-04T09:03:27.8739408Z + INSTALLED_DRIVER_VERSION=580.82.07 2025-12-04T09:03:27.8739842Z + NVIDIA_SMI_STATUS=0 2025-12-04T09:03:27.8740384Z + '[' 0 -ne 0 ']' 2025-12-04T09:03:27.8740651Z + '[' 580.82.07 '!=' 580.82.07 ']' 2025-12-04T09:03:27.8740972Z + HAS_NVIDIA_DRIVER=1 2025-12-04T09:03:27.8741562Z + echo 'NVIDIA driver (580.82.07) has already been installed. Skipping NVIDIA driver installation' 2025-12-04T09:03:27.8742154Z + set -e 2025-12-04T09:03:27.8742390Z + '[' 1 -eq 0 ']' 2025-12-04T09:03:27.8742849Z NVIDIA driver (580.82.07) has already been installed. Skipping NVIDIA driver installation 2025-12-04T09:03:27.8743438Z + post_install_nvidia_driver_common 2025-12-04T09:03:27.8744966Z + sudo modprobe nvidia 2025-12-04T09:03:28.0397295Z + echo 'After installing NVIDIA driver' 2025-12-04T09:03:28.0397799Z + lspci 2025-12-04T09:03:28.0398653Z After installing NVIDIA driver 2025-12-04T09:03:28.0520199Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] 2025-12-04T09:03:28.0521115Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 2025-12-04T09:03:28.0522073Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08) 2025-12-04T09:03:28.0522754Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111 2025-12-04T09:03:28.0523401Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller 2025-12-04T09:03:28.0524078Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA) 2025-12-04T09:03:28.0524702Z 00:1b.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:28.0525293Z 00:1c.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:28.0525870Z 00:1d.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:28.0526453Z 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 2025-12-04T09:03:28.0527066Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller 2025-12-04T09:03:28.0527570Z + lsmod 2025-12-04T09:03:28.0544990Z Module Size Used by 2025-12-04T09:03:28.0545364Z nvidia_uvm 1925120 0 2025-12-04T09:03:28.0545703Z nvidia 14286848 1 nvidia_uvm 2025-12-04T09:03:28.0546065Z drm 602112 1 nvidia 2025-12-04T09:03:28.0546456Z drm_panel_orientation_quirks 32768 1 drm 2025-12-04T09:03:28.0546834Z backlight 24576 1 drm 2025-12-04T09:03:28.0547189Z i2c_core 110592 2 nvidia,drm 2025-12-04T09:03:28.0547552Z xt_conntrack 16384 1 2025-12-04T09:03:28.0547878Z nft_chain_nat 16384 3 2025-12-04T09:03:28.0548199Z xt_MASQUERADE 20480 1 2025-12-04T09:03:28.0548681Z nf_nat 57344 2 nft_chain_nat,xt_MASQUERADE 2025-12-04T09:03:28.0549084Z nf_conntrack_netlink 57344 0 2025-12-04T09:03:28.0549550Z nf_conntrack 184320 4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE 2025-12-04T09:03:28.0550081Z nf_defrag_ipv6 24576 1 nf_conntrack 2025-12-04T09:03:28.0550457Z nf_defrag_ipv4 16384 1 nf_conntrack 2025-12-04T09:03:28.0550796Z xfrm_user 57344 1 2025-12-04T09:03:28.0551114Z xfrm_algo 16384 1 xfrm_user 2025-12-04T09:03:28.0551456Z xt_addrtype 16384 2 2025-12-04T09:03:28.0551751Z nft_compat 20480 4 2025-12-04T09:03:28.0553162Z nf_tables 311296 57 nft_compat,nft_chain_nat 2025-12-04T09:03:28.0553677Z nfnetlink 20480 4 nft_compat,nf_conntrack_netlink,nf_tables 2025-12-04T09:03:28.0554119Z br_netfilter 36864 0 2025-12-04T09:03:28.0554453Z bridge 323584 1 br_netfilter 2025-12-04T09:03:28.0554815Z stp 16384 1 bridge 2025-12-04T09:03:28.0555153Z llc 16384 2 bridge,stp 2025-12-04T09:03:28.0555483Z overlay 167936 0 2025-12-04T09:03:28.0555783Z tls 139264 0 2025-12-04T09:03:28.0556082Z nls_ascii 16384 1 2025-12-04T09:03:28.0556411Z nls_cp437 20480 1 2025-12-04T09:03:28.0556706Z vfat 24576 1 2025-12-04T09:03:28.0557005Z fat 86016 1 vfat 2025-12-04T09:03:28.0557314Z sunrpc 700416 1 2025-12-04T09:03:28.0557608Z i8042 45056 0 2025-12-04T09:03:28.0557903Z skx_edac_common 28672 0 2025-12-04T09:03:28.0558194Z ena 184320 0 2025-12-04T09:03:28.0558501Z serio 28672 3 i8042 2025-12-04T09:03:28.0558837Z ghash_clmulni_intel 16384 0 2025-12-04T09:03:28.0559131Z button 24576 0 2025-12-04T09:03:28.0559449Z sch_fq_codel 20480 33 2025-12-04T09:03:28.0559757Z dm_mod 188416 0 2025-12-04T09:03:28.0560052Z fuse 184320 1 2025-12-04T09:03:28.0560332Z loop 36864 0 2025-12-04T09:03:28.0560630Z configfs 57344 1 2025-12-04T09:03:28.0560932Z dmi_sysfs 20480 0 2025-12-04T09:03:28.0561219Z crc32_pclmul 16384 0 2025-12-04T09:03:28.0561519Z crc32c_intel 24576 0 2025-12-04T09:03:28.0561822Z efivarfs 24576 1 2025-12-04T09:03:28.0562108Z + modinfo nvidia 2025-12-04T09:03:28.0562614Z filename: /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko 2025-12-04T09:03:28.0563163Z import_ns: DMA_BUF 2025-12-04T09:03:28.0563460Z alias: char-major-195-* 2025-12-04T09:03:28.0563773Z version: 580.82.07 2025-12-04T09:03:28.0564068Z supported: external 2025-12-04T09:03:28.0564364Z license: Dual MIT/GPL 2025-12-04T09:03:28.0564692Z firmware: nvidia/580.82.07/gsp_tu10x.bin 2025-12-04T09:03:28.0565095Z firmware: nvidia/580.82.07/gsp_ga10x.bin 2025-12-04T09:03:28.0565482Z srcversion: BA7240A71DCF7DC6FE88C1D 2025-12-04T09:03:28.0565878Z alias: of:N*T*Cnvidia,tegra264-displayC* 2025-12-04T09:03:28.0566286Z alias: of:N*T*Cnvidia,tegra264-display 2025-12-04T09:03:28.0566701Z alias: of:N*T*Cnvidia,tegra234-displayC* 2025-12-04T09:03:28.0567117Z alias: of:N*T*Cnvidia,tegra234-display 2025-12-04T09:03:28.0567510Z alias: pci:v000010DEd*sv*sd*bc06sc80i00* 2025-12-04T09:03:28.0567912Z alias: pci:v000010DEd*sv*sd*bc03sc02i00* 2025-12-04T09:03:28.0568312Z alias: pci:v000010DEd*sv*sd*bc03sc00i00* 2025-12-04T09:03:28.0568676Z depends: i2c-core,drm 2025-12-04T09:03:28.0568983Z retpoline: Y 2025-12-04T09:03:28.0569238Z name: nvidia 2025-12-04T09:03:28.0569658Z vermagic: 6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 2025-12-04T09:03:28.0570231Z parm: NvSwitchRegDwords:NvSwitch regkey (charp) 2025-12-04T09:03:28.0570772Z parm: NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp) 2025-12-04T09:03:28.0571279Z parm: NVreg_ResmanDebugLevel:int 2025-12-04T09:03:28.0571640Z parm: NVreg_RmLogonRC:int 2025-12-04T09:03:28.0572000Z parm: NVreg_ModifyDeviceFiles:int 2025-12-04T09:03:28.0572380Z parm: NVreg_DeviceFileUID:int 2025-12-04T09:03:28.0572730Z parm: NVreg_DeviceFileGID:int 2025-12-04T09:03:28.0573093Z parm: NVreg_DeviceFileMode:int 2025-12-04T09:03:28.0573525Z parm: NVreg_InitializeSystemMemoryAllocations:int 2025-12-04T09:03:28.0573974Z parm: NVreg_UsePageAttributeTable:int 2025-12-04T09:03:28.0574475Z parm: NVreg_EnablePCIeGen3:int 2025-12-04T09:03:28.0574908Z parm: NVreg_EnableMSI:int 2025-12-04T09:03:28.0575275Z parm: NVreg_EnableStreamMemOPs:int 2025-12-04T09:03:28.0575693Z parm: NVreg_RestrictProfilingToAdminUsers:int 2025-12-04T09:03:28.0576161Z parm: NVreg_PreserveVideoMemoryAllocations:int 2025-12-04T09:03:28.0576893Z parm: NVreg_EnableS0ixPowerManagement:int 2025-12-04T09:03:28.0577392Z parm: NVreg_S0ixPowerManagementVideoMemoryThreshold:int 2025-12-04T09:03:28.0577948Z parm: NVreg_DynamicPowerManagement:int 2025-12-04T09:03:28.0578466Z parm: NVreg_DynamicPowerManagementVideoMemoryThreshold:int 2025-12-04T09:03:28.0578965Z parm: NVreg_EnableGpuFirmware:int 2025-12-04T09:03:28.0579389Z parm: NVreg_EnableGpuFirmwareLogs:int 2025-12-04T09:03:28.0579850Z parm: NVreg_OpenRmEnableUnsupportedGpus:int 2025-12-04T09:03:28.0580315Z parm: NVreg_EnableUserNUMAManagement:int 2025-12-04T09:03:28.0580738Z parm: NVreg_MemoryPoolSize:int 2025-12-04T09:03:28.0581142Z parm: NVreg_KMallocHeapMaxSize:int 2025-12-04T09:03:28.0581554Z parm: NVreg_VMallocHeapMaxSize:int 2025-12-04T09:03:28.0581945Z parm: NVreg_IgnoreMMIOCheck:int 2025-12-04T09:03:28.0582335Z parm: NVreg_NvLinkDisable:int 2025-12-04T09:03:28.0582767Z parm: NVreg_EnablePCIERelaxedOrderingMode:int 2025-12-04T09:03:28.0583205Z parm: NVreg_RegisterPCIDriver:int 2025-12-04T09:03:28.0583645Z parm: NVreg_RegisterPlatformDeviceDriver:int 2025-12-04T09:03:28.0584089Z parm: NVreg_EnableResizableBar:int 2025-12-04T09:03:28.0584506Z parm: NVreg_EnableDbgBreakpoint:int 2025-12-04T09:03:28.0584920Z parm: NVreg_EnableNonblockingOpen:int 2025-12-04T09:03:28.0585363Z parm: NVreg_CoherentGPUMemoryMode:charp 2025-12-04T09:03:28.0585787Z parm: NVreg_RegistryDwords:charp 2025-12-04T09:03:28.0586196Z parm: NVreg_RegistryDwordsPerDevice:charp 2025-12-04T09:03:28.0586612Z parm: NVreg_RmMsg:charp 2025-12-04T09:03:28.0586968Z parm: NVreg_GpuBlacklist:charp 2025-12-04T09:03:28.0587352Z parm: NVreg_TemporaryFilePath:charp 2025-12-04T09:03:28.0587750Z parm: NVreg_ExcludedGpus:charp 2025-12-04T09:03:28.0588281Z parm: NVreg_DmaRemapPeerMmio:int 2025-12-04T09:03:28.0588673Z parm: NVreg_RmNvlinkBandwidth:charp 2025-12-04T09:03:28.0589083Z parm: NVreg_RmNvlinkBandwidthLinkCount:int 2025-12-04T09:03:28.0589501Z parm: NVreg_ImexChannelCount:int 2025-12-04T09:03:28.0589891Z parm: NVreg_CreateImexChannel0:int 2025-12-04T09:03:28.0590295Z parm: NVreg_GrdmaPciTopoCheckOverride:int 2025-12-04T09:03:28.0590702Z parm: rm_firmware_active:charp 2025-12-04T09:03:28.0591048Z + set +e 2025-12-04T09:03:28.0591261Z + nvidia-smi 2025-12-04T09:03:29.8723157Z Thu Dec 4 09:03:29 2025 2025-12-04T09:03:29.8723692Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:03:29.8724338Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T09:03:29.8724948Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:03:29.8725578Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:03:29.8726248Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:03:29.8726797Z | | | MIG M. | 2025-12-04T09:03:29.8727258Z |=========================================+========================+======================| 2025-12-04T09:03:29.9108047Z | 0 Tesla T4 Off | 00000000:00:1B.0 Off | 0 | 2025-12-04T09:03:29.9108723Z | N/A 30C P0 24W / 70W | 0MiB / 15360MiB | 3% Default | 2025-12-04T09:03:29.9109712Z | | | N/A | 2025-12-04T09:03:29.9110212Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:03:29.9110753Z | 1 Tesla T4 Off | 00000000:00:1C.0 Off | 0 | 2025-12-04T09:03:29.9111279Z | N/A 29C P0 25W / 70W | 0MiB / 15360MiB | 5% Default | 2025-12-04T09:03:29.9111731Z | | | N/A | 2025-12-04T09:03:29.9112215Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:03:29.9112754Z | 2 Tesla T4 Off | 00000000:00:1D.0 Off | 0 | 2025-12-04T09:03:29.9113266Z | N/A 28C P0 25W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:03:29.9113727Z | | | N/A | 2025-12-04T09:03:29.9114210Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:03:29.9114741Z | 3 Tesla T4 Off | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:03:29.9115251Z | N/A 29C P0 25W / 70W | 0MiB / 15360MiB | 5% Default | 2025-12-04T09:03:29.9115698Z | | | N/A | 2025-12-04T09:03:29.9116177Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:03:29.9116524Z 2025-12-04T09:03:29.9116742Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:03:29.9117269Z | Processes: | 2025-12-04T09:03:29.9117805Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:03:29.9118314Z | ID ID Usage | 2025-12-04T09:03:29.9118731Z |=========================================================================================| 2025-12-04T09:03:29.9131856Z | No running processes found | 2025-12-04T09:03:29.9132467Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:03:31.6090965Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T09:03:33.3999654Z Tesla T4 2025-12-04T09:03:34.7087719Z + NVIDIA_SMI_STATUS=0 2025-12-04T09:03:34.7088112Z + '[' 0 -eq 0 ']' 2025-12-04T09:03:34.7088404Z + echo 'INFO: Ignoring allowed status 0' 2025-12-04T09:03:34.7088771Z + set -e 2025-12-04T09:03:34.7089030Z INFO: Ignoring allowed status 0 2025-12-04T09:03:34.7094313Z == Installing nvidia container toolkit for amzn2023 == 2025-12-04T09:03:34.7098383Z + sudo yum install -y yum-utils 2025-12-04T09:03:35.1764045Z Last metadata expiration check: 0:07:31 ago on Thu Dec 4 08:56:04 2025. 2025-12-04T09:03:35.2072605Z Package dnf-utils-4.3.0-13.amzn2023.0.5.noarch is already installed. 2025-12-04T09:03:35.2673598Z Dependencies resolved. 2025-12-04T09:03:35.2974517Z Nothing to do. 2025-12-04T09:03:35.2975057Z Complete! 2025-12-04T09:03:35.6842778Z + [[ amzn2023 == \a\m\z\n\2\0\2\3 ]] 2025-12-04T09:03:35.6843531Z + YUM_REPO_URL=https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:03:35.6844611Z + sudo yum-config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:03:36.0725931Z Adding repo from: https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo 2025-12-04T09:03:36.1213798Z + sudo yum install -y nvidia-container-toolkit-1.17.8 libnvidia-container-tools-1.17.8 libnvidia-container1-1.17.8 nvidia-container-toolkit-base-1.17.8 2025-12-04T09:03:36.7022893Z nvidia-container-toolkit 21 kB/s | 833 B 00:00 2025-12-04T09:03:36.7971515Z Dependencies resolved. 2025-12-04T09:03:36.8275062Z ================================================================================ 2025-12-04T09:03:36.8275644Z Package Arch Version Repository Size 2025-12-04T09:03:36.8276121Z ================================================================================ 2025-12-04T09:03:36.8276501Z Downgrading: 2025-12-04T09:03:36.8276942Z libnvidia-container-tools x86_64 1.17.8-1 nvidia-container-toolkit 40 k 2025-12-04T09:03:36.8277641Z libnvidia-container1 x86_64 1.17.8-1 nvidia-container-toolkit 1.0 M 2025-12-04T09:03:36.8278329Z nvidia-container-toolkit x86_64 1.17.8-1 nvidia-container-toolkit 1.2 M 2025-12-04T09:03:36.8279056Z nvidia-container-toolkit-base x86_64 1.17.8-1 nvidia-container-toolkit 5.8 M 2025-12-04T09:03:36.8279521Z 2025-12-04T09:03:36.8279637Z Transaction Summary 2025-12-04T09:03:36.8279938Z ================================================================================ 2025-12-04T09:03:36.8280318Z Downgrade 4 Packages 2025-12-04T09:03:36.8280494Z 2025-12-04T09:03:36.8280616Z Total download size: 8.0 M 2025-12-04T09:03:36.8280928Z Downloading Packages: 2025-12-04T09:03:36.9135691Z (1/4): libnvidia-container-tools-1.17.8-1.x86_6 480 kB/s | 40 kB 00:00 2025-12-04T09:03:36.9785691Z (2/4): libnvidia-container1-1.17.8-1.x86_64.rpm 6.5 MB/s | 1.0 MB 00:00 2025-12-04T09:03:37.0146669Z (3/4): nvidia-container-toolkit-1.17.8-1.x86_64 6.7 MB/s | 1.2 MB 00:00 2025-12-04T09:03:37.1319197Z (4/4): nvidia-container-toolkit-base-1.17.8-1.x 26 MB/s | 5.8 MB 00:00 2025-12-04T09:03:37.1326496Z -------------------------------------------------------------------------------- 2025-12-04T09:03:37.1330209Z Total 26 MB/s | 8.0 MB 00:00 2025-12-04T09:03:37.1332904Z Running transaction check 2025-12-04T09:03:37.1481244Z Transaction check succeeded. 2025-12-04T09:03:37.1481633Z Running transaction test 2025-12-04T09:03:37.2006550Z Transaction test succeeded. 2025-12-04T09:03:37.2007560Z Running transaction 2025-12-04T09:03:38.0868905Z Preparing : 1/1 2025-12-04T09:03:38.2086147Z Downgrading : nvidia-container-toolkit-base-1.17.8-1.x86_64 1/8 2025-12-04T09:03:38.2207679Z Downgrading : libnvidia-container1-1.17.8-1.x86_64 2/8 2025-12-04T09:03:38.2653438Z Running scriptlet: libnvidia-container1-1.17.8-1.x86_64 2/8 2025-12-04T09:03:38.3938045Z Downgrading : libnvidia-container-tools-1.17.8-1.x86_64 3/8 2025-12-04T09:03:38.4628340Z Downgrading : nvidia-container-toolkit-1.17.8-1.x86_64 4/8 2025-12-04T09:03:38.5165810Z Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64 4/8 2025-12-04T09:03:38.5213884Z Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:03:38.5214616Z Cleanup : nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:03:38.5538203Z Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64 5/8 2025-12-04T09:03:38.5582800Z Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:03:38.5583523Z Cleanup : libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:03:38.5964346Z Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64 6/8 2025-12-04T09:03:38.6013994Z Running scriptlet: libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:03:38.6014707Z Cleanup : libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:03:38.6568407Z Running scriptlet: libnvidia-container1-1.18.1-1.x86_64 7/8 2025-12-04T09:03:38.6613200Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:03:38.6613929Z Cleanup : nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:03:38.7178384Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:03:38.7715857Z Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64 8/8 2025-12-04T09:04:48.5083962Z Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8 2025-12-04T09:04:48.5084739Z Verifying : libnvidia-container-tools-1.17.8-1.x86_64 1/8 2025-12-04T09:04:48.5085402Z Verifying : libnvidia-container-tools-1.18.1-1.x86_64 2/8 2025-12-04T09:04:48.5086055Z Verifying : libnvidia-container1-1.17.8-1.x86_64 3/8 2025-12-04T09:04:48.5086682Z Verifying : libnvidia-container1-1.18.1-1.x86_64 4/8 2025-12-04T09:04:48.5087356Z Verifying : nvidia-container-toolkit-1.17.8-1.x86_64 5/8 2025-12-04T09:04:48.5088009Z Verifying : nvidia-container-toolkit-1.18.1-1.x86_64 6/8 2025-12-04T09:04:48.5088659Z Verifying : nvidia-container-toolkit-base-1.17.8-1.x86_64 7/8 2025-12-04T09:04:48.6697321Z Verifying : nvidia-container-toolkit-base-1.18.1-1.x86_64 8/8================================================================================ 2025-12-04T09:04:48.6698082Z WARNING: 2025-12-04T09:04:48.6698392Z A newer release of "Amazon Linux" is available. 2025-12-04T09:04:48.6699294Z 2025-12-04T09:04:48.6699426Z Available Versions: 2025-12-04T09:04:48.6699627Z 2025-12-04T09:04:48.6699748Z Version 2023.9.20250929: 2025-12-04T09:04:48.6700123Z Run the following command to upgrade to 2023.9.20250929: 2025-12-04T09:04:48.6700457Z 2025-12-04T09:04:48.6700601Z dnf upgrade --releasever=2023.9.20250929 2025-12-04T09:04:48.6700869Z 2025-12-04T09:04:48.6701012Z Release notes: 2025-12-04T09:04:48.6701531Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20250929.html 2025-12-04T09:04:48.6702024Z 2025-12-04T09:04:48.6702132Z Version 2023.9.20251014: 2025-12-04T09:04:48.6702519Z Run the following command to upgrade to 2023.9.20251014: 2025-12-04T09:04:48.6702837Z 2025-12-04T09:04:48.6702987Z dnf upgrade --releasever=2023.9.20251014 2025-12-04T09:04:48.6703249Z 2025-12-04T09:04:48.6703352Z Release notes: 2025-12-04T09:04:48.6703851Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251014.html 2025-12-04T09:04:48.6704321Z 2025-12-04T09:04:48.6704440Z Version 2023.9.20251020: 2025-12-04T09:04:48.6704827Z Run the following command to upgrade to 2023.9.20251020: 2025-12-04T09:04:48.6705140Z 2025-12-04T09:04:48.6705277Z dnf upgrade --releasever=2023.9.20251020 2025-12-04T09:04:48.6705553Z 2025-12-04T09:04:48.6705655Z Release notes: 2025-12-04T09:04:48.6706151Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251020.html 2025-12-04T09:04:48.6706629Z 2025-12-04T09:04:48.6706738Z Version 2023.9.20251027: 2025-12-04T09:04:48.6707124Z Run the following command to upgrade to 2023.9.20251027: 2025-12-04T09:04:48.6707454Z 2025-12-04T09:04:48.6707591Z dnf upgrade --releasever=2023.9.20251027 2025-12-04T09:04:48.6707852Z 2025-12-04T09:04:48.6707969Z Release notes: 2025-12-04T09:04:48.6708448Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251027.html 2025-12-04T09:04:48.6708930Z 2025-12-04T09:04:48.6709145Z Version 2023.9.20251105: 2025-12-04T09:04:48.6709520Z Run the following command to upgrade to 2023.9.20251105: 2025-12-04T09:04:48.6709823Z 2025-12-04T09:04:48.6709970Z dnf upgrade --releasever=2023.9.20251105 2025-12-04T09:04:48.6710219Z 2025-12-04T09:04:48.6710317Z Release notes: 2025-12-04T09:04:48.6710799Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251105.html 2025-12-04T09:04:48.6711558Z 2025-12-04T09:04:48.6711807Z Version 2023.9.20251110: 2025-12-04T09:04:48.6712274Z Run the following command to upgrade to 2023.9.20251110: 2025-12-04T09:04:48.6712586Z 2025-12-04T09:04:48.6712714Z dnf upgrade --releasever=2023.9.20251110 2025-12-04T09:04:48.6713079Z 2025-12-04T09:04:48.6713172Z Release notes: 2025-12-04T09:04:48.6713610Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251110.html 2025-12-04T09:04:48.6714021Z 2025-12-04T09:04:48.6714114Z Version 2023.9.20251117: 2025-12-04T09:04:48.6714453Z Run the following command to upgrade to 2023.9.20251117: 2025-12-04T09:04:48.6714731Z 2025-12-04T09:04:48.6714863Z dnf upgrade --releasever=2023.9.20251117 2025-12-04T09:04:48.6715094Z 2025-12-04T09:04:48.6715182Z Release notes: 2025-12-04T09:04:48.6715616Z https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251117.html 2025-12-04T09:04:48.6716041Z 2025-12-04T09:04:48.6716167Z ================================================================================ 2025-12-04T09:04:48.7300909Z 2025-12-04T09:04:48.7301080Z 2025-12-04T09:04:48.7301183Z Downgraded: 2025-12-04T09:04:48.7301644Z libnvidia-container-tools-1.17.8-1.x86_64 2025-12-04T09:04:48.7302345Z libnvidia-container1-1.17.8-1.x86_64 2025-12-04T09:04:48.7303036Z nvidia-container-toolkit-1.17.8-1.x86_64 2025-12-04T09:04:48.7303746Z nvidia-container-toolkit-base-1.17.8-1.x86_64 2025-12-04T09:04:48.7304188Z 2025-12-04T09:04:48.7304287Z Complete! 2025-12-04T09:04:48.8056574Z + sudo systemctl restart docker 2025-12-04T09:04:57.6536460Z Thu Dec 4 09:04:57 2025 2025-12-04T09:04:57.6537209Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:04:57.6537854Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T09:04:57.6538518Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:04:57.6539152Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:04:57.6539830Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:04:57.6540382Z | | | MIG M. | 2025-12-04T09:04:57.6540793Z |=========================================+========================+======================| 2025-12-04T09:04:57.6932092Z | 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 | 2025-12-04T09:04:57.6932671Z | N/A 30C P0 24W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:04:57.6933163Z | | | N/A | 2025-12-04T09:04:57.6933784Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:04:57.6934324Z | 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 | 2025-12-04T09:04:57.6934835Z | N/A 29C P0 25W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:04:57.6935283Z | | | N/A | 2025-12-04T09:04:57.6935766Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:04:57.6936402Z | 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 | 2025-12-04T09:04:57.6937092Z | N/A 28C P0 25W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:04:57.6937562Z | | | N/A | 2025-12-04T09:04:57.6938060Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:04:57.6938994Z | 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:04:57.6939524Z | N/A 29C P0 25W / 70W | 0MiB / 15360MiB | 9% Default | 2025-12-04T09:04:57.6939992Z | | | N/A | 2025-12-04T09:04:57.6940486Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:04:57.6940861Z 2025-12-04T09:04:57.6941075Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:04:57.6941618Z | Processes: | 2025-12-04T09:04:57.6942162Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:04:57.6942681Z | ID ID Usage | 2025-12-04T09:04:57.6943119Z |=========================================================================================| 2025-12-04T09:04:57.6958340Z | No running processes found | 2025-12-04T09:04:57.6959355Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:04:58.0896022Z Unable to find image 'public.ecr.aws/docker/library/python:3.13' locally 2025-12-04T09:04:58.3181514Z 3.13: Pulling from docker/library/python 2025-12-04T09:04:58.3975888Z 53c88f1dfeb7: Pulling fs layer 2025-12-04T09:04:58.3976405Z eae668646f44: Pulling fs layer 2025-12-04T09:04:58.3976912Z ff2e6e687b6c: Pulling fs layer 2025-12-04T09:04:58.3977284Z 7c40a3faff76: Pulling fs layer 2025-12-04T09:04:58.3977602Z 967a3b1c8fef: Pulling fs layer 2025-12-04T09:04:58.3977934Z a64e1a44f22a: Pulling fs layer 2025-12-04T09:04:58.3978258Z 52655f8a5bcc: Pulling fs layer 2025-12-04T09:04:58.3978561Z 7c40a3faff76: Waiting 2025-12-04T09:04:58.3978870Z 967a3b1c8fef: Waiting 2025-12-04T09:04:58.3979162Z a64e1a44f22a: Waiting 2025-12-04T09:04:58.3979421Z 52655f8a5bcc: Waiting 2025-12-04T09:04:58.5470550Z eae668646f44: Verifying Checksum 2025-12-04T09:04:58.5470937Z eae668646f44: Download complete 2025-12-04T09:04:58.6387423Z 53c88f1dfeb7: Download complete 2025-12-04T09:04:58.7104632Z 967a3b1c8fef: Verifying Checksum 2025-12-04T09:04:58.7105042Z 967a3b1c8fef: Download complete 2025-12-04T09:04:58.7376412Z ff2e6e687b6c: Verifying Checksum 2025-12-04T09:04:58.7377010Z ff2e6e687b6c: Download complete 2025-12-04T09:04:58.7634250Z 52655f8a5bcc: Download complete 2025-12-04T09:04:58.8530843Z a64e1a44f22a: Verifying Checksum 2025-12-04T09:04:58.8531252Z a64e1a44f22a: Download complete 2025-12-04T09:04:59.5804177Z 7c40a3faff76: Verifying Checksum 2025-12-04T09:04:59.5804604Z 7c40a3faff76: Download complete 2025-12-04T09:04:59.8193818Z 53c88f1dfeb7: Pull complete 2025-12-04T09:05:00.3561203Z eae668646f44: Pull complete 2025-12-04T09:05:02.0614429Z ff2e6e687b6c: Pull complete 2025-12-04T09:05:06.9540752Z 7c40a3faff76: Pull complete 2025-12-04T09:05:07.1527067Z 967a3b1c8fef: Pull complete 2025-12-04T09:05:07.7168197Z a64e1a44f22a: Pull complete 2025-12-04T09:05:07.7412467Z 52655f8a5bcc: Pull complete 2025-12-04T09:05:07.7567834Z Digest: sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0 2025-12-04T09:05:07.7613093Z Status: Downloaded newer image for public.ecr.aws/docker/library/python:3.13 2025-12-04T09:05:16.3600656Z Thu Dec 4 09:05:16 2025 2025-12-04T09:05:16.3601191Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:05:16.3601824Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T09:05:16.3602439Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:05:16.3603045Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T09:05:16.3604157Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T09:05:16.3604693Z | | | MIG M. | 2025-12-04T09:05:16.3605104Z |=========================================+========================+======================| 2025-12-04T09:05:16.4197960Z | 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 | 2025-12-04T09:05:16.4198655Z | N/A 28C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:05:16.4199151Z | | | N/A | 2025-12-04T09:05:16.4199643Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:05:16.4200165Z | 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 | 2025-12-04T09:05:16.4200676Z | N/A 28C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:05:16.4201168Z | | | N/A | 2025-12-04T09:05:16.4201650Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:05:16.4202168Z | 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 | 2025-12-04T09:05:16.4202676Z | N/A 27C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:05:16.4203136Z | | | N/A | 2025-12-04T09:05:16.4203616Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:05:16.4204131Z | 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 | 2025-12-04T09:05:16.4204638Z | N/A 28C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T09:05:16.4205109Z | | | N/A | 2025-12-04T09:05:16.4205577Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T09:05:16.4205938Z 2025-12-04T09:05:16.4206146Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:05:16.4206671Z | Processes: | 2025-12-04T09:05:16.4207210Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T09:05:16.4207704Z | ID ID Usage | 2025-12-04T09:05:16.4208125Z |=========================================================================================| 2025-12-04T09:05:16.4224230Z | No running processes found | 2025-12-04T09:05:16.4224905Z +-----------------------------------------------------------------------------------------+ 2025-12-04T09:05:17.9736049Z Command completed after 1 attempt(s). 2025-12-04T09:05:17.9830461Z Prepare all required actions 2025-12-04T09:05:17.9865764Z ##[group]Run ./.github/actions/get-workflow-job-id 2025-12-04T09:05:17.9866240Z with: 2025-12-04T09:05:17.9866970Z github-token: *** 2025-12-04T09:05:17.9867413Z env: 2025-12-04T09:05:17.9867732Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:17.9868130Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:17.9868652Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:17.9869268Z ##[endgroup] 2025-12-04T09:05:17.9888043Z ##[group]Run set -eux 2025-12-04T09:05:17.9888441Z set -eux 2025-12-04T09:05:17.9889017Z python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-12-04T09:05:17.9900082Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:17.9900710Z env: 2025-12-04T09:05:17.9901256Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:17.9901687Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:17.9902286Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:17.9903016Z GITHUB_TOKEN: *** 2025-12-04T09:05:17.9903388Z ##[endgroup] 2025-12-04T09:05:17.9936442Z + python3 .github/scripts/get_workflow_job_id.py 19922768520 i-035b9d8fd6b020edf 2025-12-04T09:05:19.2302934Z Setting output job-id=57116084904 2025-12-04T09:05:19.2303775Z Setting output job-name=linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:19.2418129Z ##[group]Run python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-12-04T09:05:19.2418995Z python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84 2025-12-04T09:05:19.2420124Z python3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 & 2025-12-04T09:05:19.2421374Z echo "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}" 2025-12-04T09:05:19.2427797Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:19.2428226Z env: 2025-12-04T09:05:19.2428458Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:19.2428757Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:19.2429106Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:19.2429497Z JOB_ID: 57116084904 2025-12-04T09:05:19.2430186Z JOB_NAME: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:19.2430891Z WORKFLOW_NAME: trunk 2025-12-04T09:05:19.2431175Z WORKFLOW_RUN_ID: 19922768520 2025-12-04T09:05:19.2431640Z MONITOR_LOG_INTERVAL: 5 2025-12-04T09:05:19.2431953Z MONITOR_DATA_COLLECT_INTERVAL: 1 2025-12-04T09:05:19.2432521Z ##[endgroup] 2025-12-04T09:05:19.5611306Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T09:05:19.9454944Z Collecting psutil==5.9.8 2025-12-04T09:05:19.9623696Z Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB) 2025-12-04T09:05:20.0396064Z Collecting dataclasses_json==0.6.7 2025-12-04T09:05:20.0442635Z Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB) 2025-12-04T09:05:20.0715469Z Collecting nvidia-ml-py==11.525.84 2025-12-04T09:05:20.0754994Z Downloading nvidia_ml_py-11.525.84-py3-none-any.whl (34 kB) 2025-12-04T09:05:20.2002121Z Collecting marshmallow<4.0.0,>=3.18.0 2025-12-04T09:05:20.2040201Z Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB) 2025-12-04T09:05:20.2274690Z Collecting typing-inspect<1,>=0.4.0 2025-12-04T09:05:20.2315735Z Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB) 2025-12-04T09:05:20.2894860Z Collecting packaging>=17.0 2025-12-04T09:05:20.2931405Z Downloading packaging-25.0-py3-none-any.whl (66 kB) 2025-12-04T09:05:20.3478455Z Collecting typing-extensions>=3.7.4 2025-12-04T09:05:20.3520429Z Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB) 2025-12-04T09:05:20.3720147Z Collecting mypy-extensions>=0.3.0 2025-12-04T09:05:20.3755332Z Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB) 2025-12-04T09:05:20.4815893Z Installing collected packages: typing-extensions, packaging, mypy-extensions, typing-inspect, marshmallow, psutil, nvidia-ml-py, dataclasses-json 2025-12-04T09:05:20.7765326Z Successfully installed dataclasses-json-0.6.7 marshmallow-3.26.1 mypy-extensions-1.1.0 nvidia-ml-py-11.525.84 packaging-25.0 psutil-5.9.8 typing-extensions-4.15.0 typing-inspect-0.9.0 2025-12-04T09:05:20.9635695Z Prepare all required actions 2025-12-04T09:05:20.9636263Z Getting action download info 2025-12-04T09:05:21.1818447Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-12-04T09:05:21.4467634Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093) 2025-12-04T09:05:21.7482417Z ##[group]Run ./.github/actions/download-build-artifacts 2025-12-04T09:05:21.7482836Z with: 2025-12-04T09:05:21.7483108Z name: linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T09:05:21.7483461Z s3-bucket: gha-artifacts 2025-12-04T09:05:21.7483759Z env: 2025-12-04T09:05:21.7483979Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:21.7484272Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:21.7484630Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:21.7485031Z ##[endgroup] 2025-12-04T09:05:21.7516959Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T09:05:21.7517332Z with: 2025-12-04T09:05:21.7517626Z name: linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T09:05:21.7517972Z s3-bucket: gha-artifacts 2025-12-04T09:05:21.7518255Z region: us-east-1 2025-12-04T09:05:21.7518526Z env: 2025-12-04T09:05:21.7518760Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:21.7519067Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:21.7519423Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:21.7519826Z ##[endgroup] 2025-12-04T09:05:22.2508863Z (node:62861) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T09:05:22.2509510Z 2025-12-04T09:05:22.2509740Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T09:05:22.2510357Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T09:05:22.2510992Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T09:05:22.4982654Z Found 1 objects with prefix pytorch/pytorch/19922768520/linux-jammy-cuda12.8-py3.10-gcc11/ 2025-12-04T09:05:22.4983532Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T09:05:30.7516386Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T09:05:30.7521860Z Artifact download has finished successfully 2025-12-04T09:05:30.7720246Z ##[group]Run unzip -o artifacts.zip 2025-12-04T09:05:30.7720654Z unzip -o artifacts.zip 2025-12-04T09:05:30.7727749Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:30.7728178Z env: 2025-12-04T09:05:30.7728430Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:30.7728744Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:30.7729096Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:30.7729511Z ##[endgroup] 2025-12-04T09:05:30.7805532Z Archive: artifacts.zip 2025-12-04T09:05:30.7805917Z creating: dist/ 2025-12-04T09:05:30.7942244Z inflating: dist/.ninja_log 2025-12-04T09:05:33.3120656Z inflating: dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl 2025-12-04T09:05:33.3121719Z creating: build/ 2025-12-04T09:05:33.3122040Z creating: build/custom_test_artifacts/ 2025-12-04T09:05:33.3122507Z creating: build/custom_test_artifacts/custom-op-build/ 2025-12-04T09:05:33.3123068Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2025-12-04T09:05:33.3123781Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:05:33.3128633Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:05:33.3129435Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/ 2025-12-04T09:05:33.3130208Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:05:33.3131047Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:05:33.3132153Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:05:33.3133110Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:05:33.3134147Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:05:33.3135026Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:05:33.3136013Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:05:33.3137108Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:05:33.3138082Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:05:33.3139438Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:05:33.3140384Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:05:33.3141930Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:05:33.3143839Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:05:33.3144800Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:05:33.3145656Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:05:33.3201753Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:05:33.3258142Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:05:33.3259443Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:05:33.3317838Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:05:33.3319074Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:05:33.3320299Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:05:33.3321957Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:05:33.3323204Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:05:33.3324438Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:05:33.3325672Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:05:33.3326869Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:05:33.3328049Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:05:33.3329171Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:05:33.3330253Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:05:33.3331311Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:05:33.3332384Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:05:33.3333732Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:05:33.3334759Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:05:33.3405944Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:05:33.3406891Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:05:33.3482958Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:05:33.3483907Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:05:33.3484607Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:05:33.3485349Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2025-12-04T09:05:33.3486124Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2025-12-04T09:05:33.3486976Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2025-12-04T09:05:33.3487938Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2025-12-04T09:05:33.3488879Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2025-12-04T09:05:33.3489749Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2025-12-04T09:05:33.3490641Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2025-12-04T09:05:33.3491530Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2025-12-04T09:05:33.3492428Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2025-12-04T09:05:33.3493328Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2025-12-04T09:05:33.3494217Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2025-12-04T09:05:33.3510110Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2025-12-04T09:05:33.3698668Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2025-12-04T09:05:33.3699580Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2025-12-04T09:05:33.3700523Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2025-12-04T09:05:33.3701577Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2025-12-04T09:05:33.3702598Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2025-12-04T09:05:33.3703546Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2025-12-04T09:05:33.3704520Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2025-12-04T09:05:33.3705498Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2025-12-04T09:05:33.3706479Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2025-12-04T09:05:33.3707467Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2025-12-04T09:05:33.3708429Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2025-12-04T09:05:33.3725121Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2025-12-04T09:05:33.3804041Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2025-12-04T09:05:33.3805283Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:05:33.3806203Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:05:33.3807015Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2025-12-04T09:05:33.3807764Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2025-12-04T09:05:33.3808602Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2025-12-04T09:05:33.3809358Z inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc 2025-12-04T09:05:33.3810045Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2025-12-04T09:05:33.3810685Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2025-12-04T09:05:33.3811335Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2025-12-04T09:05:33.3974849Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2025-12-04T09:05:33.4025582Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2025-12-04T09:05:33.4026233Z creating: build/custom_test_artifacts/jit-hook-build/ 2025-12-04T09:05:33.4026798Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2025-12-04T09:05:33.4027476Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:05:33.4033791Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:05:33.4034554Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/ 2025-12-04T09:05:33.4035297Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:05:33.4036101Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:05:33.4036870Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:05:33.4037779Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:05:33.4038714Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:05:33.4039573Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:05:33.4040390Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:05:33.4041191Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:05:33.4042281Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:05:33.4043668Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:05:33.4044547Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:05:33.4046062Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:05:33.4047808Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:05:33.4048727Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:05:33.4049528Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:05:33.4104923Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:05:33.4164131Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:05:33.4165382Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:05:33.4222622Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:05:33.4223872Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:05:33.4225118Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:05:33.4226516Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:05:33.4227748Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:05:33.4228931Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:05:33.4230145Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:05:33.4231354Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:05:33.4232524Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:05:33.4233707Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:05:33.4234760Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:05:33.4235788Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:05:33.4236823Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:05:33.4237811Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:05:33.4238832Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:05:33.4312420Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:05:33.4313360Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:05:33.4388680Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:05:33.4389711Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:05:33.4390414Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:05:33.4391137Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2025-12-04T09:05:33.4391906Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2025-12-04T09:05:33.4392808Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2025-12-04T09:05:33.4393813Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2025-12-04T09:05:33.4394771Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2025-12-04T09:05:33.4395650Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2025-12-04T09:05:33.4396574Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2025-12-04T09:05:33.4397503Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2025-12-04T09:05:33.4398433Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2025-12-04T09:05:33.4399344Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2025-12-04T09:05:33.4400435Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2025-12-04T09:05:33.4416491Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2025-12-04T09:05:33.4478764Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2025-12-04T09:05:33.4479983Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:05:33.4480870Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:05:33.4481672Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2025-12-04T09:05:33.4482404Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2025-12-04T09:05:33.4483117Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2025-12-04T09:05:33.4483859Z inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc 2025-12-04T09:05:33.4484543Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2025-12-04T09:05:33.4485166Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2025-12-04T09:05:33.4485790Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2025-12-04T09:05:33.4522708Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2025-12-04T09:05:33.4523385Z creating: build/custom_test_artifacts/custom-backend-build/ 2025-12-04T09:05:33.4524007Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2025-12-04T09:05:33.4524738Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:05:33.4531033Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:05:33.4531881Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/ 2025-12-04T09:05:33.4532732Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:05:33.4533730Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:05:33.4534583Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:05:33.4535566Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:05:33.4536849Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:05:33.4537804Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:05:33.4538741Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:05:33.4539647Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:05:33.4540698Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:05:33.4541753Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:05:33.4542734Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:05:33.4544310Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:05:33.4546006Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:05:33.4547033Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/ 2025-12-04T09:05:33.4547934Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/ 2025-12-04T09:05:33.4604002Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2025-12-04T09:05:33.4659812Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2025-12-04T09:05:33.4661177Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2025-12-04T09:05:33.4720671Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2025-12-04T09:05:33.4722461Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2025-12-04T09:05:33.4723795Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2025-12-04T09:05:33.4725163Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2025-12-04T09:05:33.4726470Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2025-12-04T09:05:33.4727748Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2025-12-04T09:05:33.4729037Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2025-12-04T09:05:33.4730319Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2025-12-04T09:05:33.4731568Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2025-12-04T09:05:33.4732736Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2025-12-04T09:05:33.4733988Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c 2025-12-04T09:05:33.4735089Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin 2025-12-04T09:05:33.4736279Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2025-12-04T09:05:33.4737540Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o 2025-12-04T09:05:33.4738662Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu 2025-12-04T09:05:33.4809030Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out 2025-12-04T09:05:33.4810045Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake 2025-12-04T09:05:33.4888460Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin 2025-12-04T09:05:33.4889459Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:05:33.4890223Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:05:33.4891008Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2025-12-04T09:05:33.4891863Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2025-12-04T09:05:33.4892800Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2025-12-04T09:05:33.4893885Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2025-12-04T09:05:33.4894934Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2025-12-04T09:05:33.4896213Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2025-12-04T09:05:33.4897421Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2025-12-04T09:05:33.4898473Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2025-12-04T09:05:33.4899518Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2025-12-04T09:05:33.4900663Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2025-12-04T09:05:33.4901674Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2025-12-04T09:05:33.4902787Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2025-12-04T09:05:33.5014252Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2025-12-04T09:05:33.5015311Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2025-12-04T09:05:33.5016431Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2025-12-04T09:05:33.5017764Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2025-12-04T09:05:33.5018900Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2025-12-04T09:05:33.5019967Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2025-12-04T09:05:33.5021264Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2025-12-04T09:05:33.5022369Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2025-12-04T09:05:33.5023481Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2025-12-04T09:05:33.5024579Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2025-12-04T09:05:33.5025646Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2025-12-04T09:05:33.5041366Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2025-12-04T09:05:33.5095523Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2025-12-04T09:05:33.5096796Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:05:33.5097943Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:05:33.5098858Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2025-12-04T09:05:33.5099690Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2025-12-04T09:05:33.5100496Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2025-12-04T09:05:33.5101302Z inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc 2025-12-04T09:05:33.5102077Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2025-12-04T09:05:33.5102789Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2025-12-04T09:05:33.5103509Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2025-12-04T09:05:33.5202813Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2025-12-04T09:05:33.5241883Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2025-12-04T09:05:33.5242479Z creating: build/lib/ 2025-12-04T09:05:33.5321538Z inflating: build/lib/libprotobuf-lite.a 2025-12-04T09:05:33.5744156Z inflating: build/lib/libprotobuf.a 2025-12-04T09:05:33.6223067Z inflating: build/lib/libprotoc.a 2025-12-04T09:05:33.6232873Z inflating: build/lib/libpthreadpool.a 2025-12-04T09:05:33.6240974Z inflating: build/lib/libcpuinfo.a 2025-12-04T09:05:33.6248405Z inflating: build/lib/libcpuinfo_internals.a 2025-12-04T09:05:33.6249213Z inflating: build/lib/libclog.a 2025-12-04T09:05:33.6270523Z inflating: build/lib/libpytorch_qnnpack.a 2025-12-04T09:05:33.6271054Z inflating: build/lib/libnnpack_reference_layers.a 2025-12-04T09:05:33.6289974Z inflating: build/lib/libnnpack.a 2025-12-04T09:05:33.6466624Z inflating: build/lib/libmicrokernels-prod.a 2025-12-04T09:05:33.7284479Z inflating: build/lib/libmicrokernels-all.a 2025-12-04T09:05:33.7353618Z inflating: build/lib/libgtest.a 2025-12-04T09:05:33.7370748Z inflating: build/lib/libgmock.a 2025-12-04T09:05:33.7371412Z inflating: build/lib/libgtest_main.a 2025-12-04T09:05:33.7371802Z inflating: build/lib/libgmock_main.a 2025-12-04T09:05:33.7458013Z inflating: build/lib/libXNNPACK.a 2025-12-04T09:05:33.7531309Z inflating: build/lib/libbenchmark.a 2025-12-04T09:05:33.7531894Z inflating: build/lib/libbenchmark_main.a 2025-12-04T09:05:33.7532688Z inflating: build/lib/libjitprofiling.a 2025-12-04T09:05:33.7540926Z inflating: build/lib/libittnotify.a 2025-12-04T09:05:33.7607002Z inflating: build/lib/libasmjit.a 2025-12-04T09:05:33.8693980Z inflating: build/lib/libfbgemm.a 2025-12-04T09:05:33.8722989Z inflating: build/lib/libtensorpipe_uv.a 2025-12-04T09:05:33.9242423Z inflating: build/lib/libtensorpipe.a 2025-12-04T09:05:33.9474385Z inflating: build/lib/libtensorpipe_cuda.a 2025-12-04T09:05:33.9604410Z inflating: build/lib/libgloo.a 2025-12-04T09:05:33.9650418Z inflating: build/lib/libonnx_proto.a 2025-12-04T09:05:34.0058318Z inflating: build/lib/libgloo_cuda.a 2025-12-04T09:05:34.0744723Z inflating: build/lib/libonnx.a 2025-12-04T09:05:34.0765139Z inflating: build/lib/libfmt.a 2025-12-04T09:05:35.0414387Z inflating: build/lib/libdnnl.a 2025-12-04T09:05:35.0867990Z inflating: build/lib/libkineto.a 2025-12-04T09:05:35.0979986Z inflating: build/lib/libc10.so 2025-12-04T09:05:35.1028082Z inflating: build/lib/libc10_cuda.so 2025-12-04T09:05:35.1029748Z inflating: build/lib/libcaffe2_nvrtc.so 2025-12-04T09:05:35.1031152Z inflating: build/lib/libtorch_global_deps.so 2025-12-04T09:05:38.0651625Z inflating: build/lib/libtorch_cpu.so 2025-12-04T09:05:38.1418315Z inflating: build/lib/libtorch_nvshmem.so 2025-12-04T09:05:41.0512043Z inflating: build/lib/libtorch_cuda.so 2025-12-04T09:05:41.0512514Z inflating: build/lib/libtorch.so 2025-12-04T09:05:41.0564198Z inflating: build/lib/libtorch_cuda_linalg.so 2025-12-04T09:05:41.0632116Z inflating: build/lib/libtorchbind_test.so 2025-12-04T09:05:41.0652527Z inflating: build/lib/libjitbackend_test.so 2025-12-04T09:05:41.0677047Z inflating: build/lib/libbackend_with_compiler.so 2025-12-04T09:05:41.0701995Z inflating: build/lib/libaoti_custom_ops.so 2025-12-04T09:05:41.0704363Z inflating: build/lib/libc10d_cuda_test.so 2025-12-04T09:05:41.0708954Z inflating: build/lib/libshm.so 2025-12-04T09:05:41.2977917Z inflating: build/lib/libtorch_python.so 2025-12-04T09:05:41.3012958Z inflating: build/lib/libnnapi_backend.so 2025-12-04T09:05:41.3013608Z creating: build/bin/ 2025-12-04T09:05:41.3450415Z inflating: build/bin/protoc-3.13.0.0 2025-12-04T09:05:41.3890707Z inflating: build/bin/protoc 2025-12-04T09:05:41.3950966Z inflating: build/bin/c10_AllocatorConfig_test 2025-12-04T09:05:41.4004312Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2025-12-04T09:05:41.4058622Z inflating: build/bin/c10_DeviceGuard_test 2025-12-04T09:05:41.4115156Z inflating: build/bin/c10_Device_test 2025-12-04T09:05:41.4178334Z inflating: build/bin/c10_DispatchKeySet_test 2025-12-04T09:05:41.4236585Z inflating: build/bin/c10_Scalar_test 2025-12-04T09:05:41.4290470Z inflating: build/bin/c10_StreamGuard_test 2025-12-04T09:05:41.4352971Z inflating: build/bin/c10_SymInt_test 2025-12-04T09:05:41.4410630Z inflating: build/bin/c10_InlineDeviceGuard_test 2025-12-04T09:05:41.4471281Z inflating: build/bin/c10_InlineStreamGuard_test 2025-12-04T09:05:41.4523113Z inflating: build/bin/c10_ConstexprCrc_test 2025-12-04T09:05:41.4583137Z inflating: build/bin/c10_SizesAndStrides_test 2025-12-04T09:05:41.4658026Z inflating: build/bin/c10_cow_test 2025-12-04T09:05:41.4714426Z inflating: build/bin/c10_Bitset_test 2025-12-04T09:05:41.4768766Z inflating: build/bin/c10_ArrayRef_test 2025-12-04T09:05:41.4821026Z inflating: build/bin/c10_DeadlockDetection_test 2025-12-04T09:05:41.4880972Z inflating: build/bin/c10_IntrusiveList_test 2025-12-04T09:05:41.4941775Z inflating: build/bin/c10_LeftRight_test 2025-12-04T09:05:41.4998527Z inflating: build/bin/c10_Half_test 2025-12-04T09:05:41.5052310Z inflating: build/bin/c10_Semaphore_test 2025-12-04T09:05:41.5113473Z inflating: build/bin/c10_Enumerate_test 2025-12-04T09:05:41.5171526Z inflating: build/bin/c10_NetworkFlow_test 2025-12-04T09:05:41.5224722Z inflating: build/bin/c10_Synchronized_test 2025-12-04T09:05:41.5286052Z inflating: build/bin/c10_ThreadLocal_test 2025-12-04T09:05:41.5342113Z inflating: build/bin/c10_accumulate_test 2025-12-04T09:05:41.5398545Z inflating: build/bin/c10_TypeIndex_test 2025-12-04T09:05:41.5453306Z inflating: build/bin/c10_bit_cast_test 2025-12-04T09:05:41.5513062Z inflating: build/bin/c10_bfloat16_test 2025-12-04T09:05:41.5574338Z inflating: build/bin/c10_complex_math_test 2025-12-04T09:05:41.5629702Z inflating: build/bin/c10_exception_test 2025-12-04T09:05:41.5683825Z inflating: build/bin/c10_error_test 2025-12-04T09:05:41.5743533Z inflating: build/bin/c10_complex_test 2025-12-04T09:05:41.5799693Z inflating: build/bin/c10_flags_test 2025-12-04T09:05:41.5855786Z inflating: build/bin/c10_generic_math_test 2025-12-04T09:05:41.6012505Z inflating: build/bin/c10_intrusive_ptr_test 2025-12-04T09:05:41.6066605Z inflating: build/bin/c10_irange_test 2025-12-04T09:05:41.6123571Z inflating: build/bin/c10_lazy_test 2025-12-04T09:05:41.6177681Z inflating: build/bin/c10_nofatal_test 2025-12-04T09:05:41.6239501Z inflating: build/bin/c10_logging_test 2025-12-04T09:05:41.6319375Z inflating: build/bin/c10_optional_test 2025-12-04T09:05:41.6384857Z inflating: build/bin/c10_ordered_preserving_dict_test 2025-12-04T09:05:41.6538310Z inflating: build/bin/c10_small_vector_test 2025-12-04T09:05:41.6596552Z inflating: build/bin/c10_registry_test 2025-12-04T09:05:41.6656967Z inflating: build/bin/c10_string_util_test 2025-12-04T09:05:41.6712049Z inflating: build/bin/c10_ssize_test 2025-12-04T09:05:41.6766099Z inflating: build/bin/c10_string_view_test 2025-12-04T09:05:41.6813512Z inflating: build/bin/c10_intrusive_ptr_benchmark 2025-12-04T09:05:41.6868296Z inflating: build/bin/c10_tempfile_test 2025-12-04T09:05:41.6928050Z inflating: build/bin/c10_typeid_test 2025-12-04T09:05:41.6984789Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test 2025-12-04T09:05:41.7043164Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream 2025-12-04T09:05:41.7098372Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads 2025-12-04T09:05:41.7155000Z inflating: build/bin/c10_cuda_CUDATest 2025-12-04T09:05:41.7211109Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device 2025-12-04T09:05:41.7266600Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes 2025-12-04T09:05:41.7324431Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks 2025-12-04T09:05:41.7381136Z inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block 2025-12-04T09:05:41.7957234Z inflating: build/bin/vec_test_all_types_DEFAULT 2025-12-04T09:05:41.8539674Z inflating: build/bin/vec_test_all_types_AVX512 2025-12-04T09:05:41.9138242Z inflating: build/bin/vec_test_all_types_AVX2 2025-12-04T09:05:41.9192789Z inflating: build/bin/test_vec_half_DEFAULT 2025-12-04T09:05:41.9292686Z inflating: build/bin/test_aoti_abi_check 2025-12-04T09:05:41.9347922Z inflating: build/bin/test_vec_half_AVX512 2025-12-04T09:05:41.9402438Z inflating: build/bin/test_vec_half_AVX2 2025-12-04T09:05:41.9480358Z inflating: build/bin/Dict_test 2025-12-04T09:05:41.9537772Z inflating: build/bin/Dimname_test 2025-12-04T09:05:41.9606218Z inflating: build/bin/MaybeOwned_test 2025-12-04T09:05:41.9667325Z inflating: build/bin/NamedTensor_test 2025-12-04T09:05:41.9730169Z inflating: build/bin/apply_utils_test 2025-12-04T09:05:41.9794259Z inflating: build/bin/atest 2025-12-04T09:05:41.9863982Z inflating: build/bin/basic 2025-12-04T09:05:41.9922926Z inflating: build/bin/broadcast_test 2025-12-04T09:05:41.9977743Z inflating: build/bin/cpu_allocator_test 2025-12-04T09:05:42.0040235Z inflating: build/bin/cpu_generator_test 2025-12-04T09:05:42.0096487Z inflating: build/bin/cpu_profiling_allocator_test 2025-12-04T09:05:42.0193489Z inflating: build/bin/cpu_rng_test 2025-12-04T09:05:42.0249623Z inflating: build/bin/dlconvertor_test 2025-12-04T09:05:42.0311035Z inflating: build/bin/extension_backend_test 2025-12-04T09:05:42.0371144Z inflating: build/bin/half_test 2025-12-04T09:05:42.0472032Z inflating: build/bin/ivalue_test 2025-12-04T09:05:42.0524502Z inflating: build/bin/lazy_tensor_test 2025-12-04T09:05:42.0581261Z inflating: build/bin/math_kernel_test 2025-12-04T09:05:42.0639151Z inflating: build/bin/memory_format_test 2025-12-04T09:05:42.0695826Z inflating: build/bin/memory_overlapping_test 2025-12-04T09:05:42.0754953Z inflating: build/bin/mobile_memory_cleanup 2025-12-04T09:05:42.0813844Z inflating: build/bin/native_test 2025-12-04T09:05:42.0871017Z inflating: build/bin/operator_name_test 2025-12-04T09:05:42.0924639Z inflating: build/bin/operators_test 2025-12-04T09:05:42.0980856Z inflating: build/bin/packedtensoraccessor_test 2025-12-04T09:05:42.1052333Z inflating: build/bin/pow_test 2025-12-04T09:05:42.1113118Z inflating: build/bin/quantized_test 2025-12-04T09:05:42.1167753Z inflating: build/bin/reduce_ops_test 2025-12-04T09:05:42.1221549Z inflating: build/bin/reportMemoryUsage_test 2025-12-04T09:05:42.1282100Z inflating: build/bin/scalar_tensor_test 2025-12-04T09:05:42.1344527Z inflating: build/bin/scalar_test 2025-12-04T09:05:42.1400823Z inflating: build/bin/StorageUtils_test 2025-12-04T09:05:42.1457175Z inflating: build/bin/stride_properties_test 2025-12-04T09:05:42.1538922Z inflating: build/bin/tensor_iterator_test 2025-12-04T09:05:42.1598108Z inflating: build/bin/test_parallel 2025-12-04T09:05:42.1652793Z inflating: build/bin/thread_init_test 2025-12-04T09:05:42.1711749Z inflating: build/bin/type_ptr_test 2025-12-04T09:05:42.1775338Z inflating: build/bin/type_test 2025-12-04T09:05:42.1830226Z inflating: build/bin/undefined_tensor_test 2025-12-04T09:05:42.1885635Z inflating: build/bin/verify_api_visibility 2025-12-04T09:05:42.1962014Z inflating: build/bin/legacy_vmap_test 2025-12-04T09:05:42.2017095Z inflating: build/bin/weakref_test 2025-12-04T09:05:42.2072641Z inflating: build/bin/wrapdim_test 2025-12-04T09:05:42.2126852Z inflating: build/bin/xla_tensor_test 2025-12-04T09:05:42.2192447Z inflating: build/bin/IListRef_test 2025-12-04T09:05:42.2297833Z inflating: build/bin/List_test 2025-12-04T09:05:42.2369598Z inflating: build/bin/KernelFunction_test 2025-12-04T09:05:42.2490837Z inflating: build/bin/kernel_function_legacy_test 2025-12-04T09:05:42.2591051Z inflating: build/bin/kernel_function_test 2025-12-04T09:05:42.2718692Z inflating: build/bin/kernel_lambda_legacy_test 2025-12-04T09:05:42.2821491Z inflating: build/bin/kernel_lambda_test 2025-12-04T09:05:42.2887403Z inflating: build/bin/kernel_stackbased_test 2025-12-04T09:05:42.2985397Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2025-12-04T09:05:42.3041642Z inflating: build/bin/CppSignature_test 2025-12-04T09:05:42.3098307Z inflating: build/bin/backend_fallback_test 2025-12-04T09:05:42.3154056Z inflating: build/bin/op_allowlist_test 2025-12-04T09:05:42.3459406Z inflating: build/bin/op_registration_test 2025-12-04T09:05:42.3529451Z inflating: build/bin/inline_container_test 2025-12-04T09:05:42.3586671Z inflating: build/bin/cuda_allocator_test 2025-12-04T09:05:42.3644022Z inflating: build/bin/cuda_apply_test 2025-12-04T09:05:42.3706624Z inflating: build/bin/cuda_atomic_ops_test 2025-12-04T09:05:42.3768339Z inflating: build/bin/cuda_caching_host_allocator_test 2025-12-04T09:05:42.3842897Z inflating: build/bin/cuda_complex_math_test 2025-12-04T09:05:42.3905002Z inflating: build/bin/cuda_complex_test 2025-12-04T09:05:42.3974395Z inflating: build/bin/cuda_cub_test 2025-12-04T09:05:42.4030310Z inflating: build/bin/cuda_cublas_handle_pool_test 2025-12-04T09:05:42.4084155Z inflating: build/bin/cuda_device_test 2025-12-04T09:05:42.4164491Z inflating: build/bin/cuda_distributions_test 2025-12-04T09:05:42.4217990Z inflating: build/bin/cuda_dlconvertor_test 2025-12-04T09:05:42.4276387Z inflating: build/bin/cuda_event_test 2025-12-04T09:05:42.4328506Z inflating: build/bin/cuda_exchange_device_test 2025-12-04T09:05:42.4391316Z inflating: build/bin/cuda_generator_test 2025-12-04T09:05:42.4445096Z inflating: build/bin/cuda_half_test 2025-12-04T09:05:42.4497829Z inflating: build/bin/cuda_allocatorTraceTracker_test 2025-12-04T09:05:42.4563779Z inflating: build/bin/cuda_stream_test 2025-12-04T09:05:42.4618375Z inflating: build/bin/cuda_reportMemoryUsage_test 2025-12-04T09:05:42.4673581Z inflating: build/bin/cuda_cudnn_test 2025-12-04T09:05:42.4728377Z inflating: build/bin/cuda_integer_divider_test 2025-12-04T09:05:42.4781741Z inflating: build/bin/cuda_optional_test 2025-12-04T09:05:42.4838923Z inflating: build/bin/cuda_packedtensoraccessor_test 2025-12-04T09:05:42.4895782Z inflating: build/bin/cuda_vectorized_test 2025-12-04T09:05:42.5975867Z inflating: build/bin/test_jit 2025-12-04T09:05:42.6322002Z inflating: build/bin/test_lazy 2025-12-04T09:05:42.6378593Z inflating: build/bin/BackoffTest 2025-12-04T09:05:42.6436913Z inflating: build/bin/FileStoreTest 2025-12-04T09:05:42.6496418Z inflating: build/bin/TCPStoreTest 2025-12-04T09:05:42.6555648Z inflating: build/bin/HashStoreTest 2025-12-04T09:05:42.6570177Z inflating: build/bin/ProcessGroupMPITest 2025-12-04T09:05:42.6571577Z inflating: build/bin/example_allreduce 2025-12-04T09:05:42.6630731Z inflating: build/bin/test_dist_autograd 2025-12-04T09:05:42.6701707Z inflating: build/bin/test_cpp_rpc 2025-12-04T09:05:42.6773888Z inflating: build/bin/ProcessGroupGlooTest 2025-12-04T09:05:42.6834954Z inflating: build/bin/ProcessGroupGlooAsyncTest 2025-12-04T09:05:42.6902114Z inflating: build/bin/ProcessGroupNCCLTest 2025-12-04T09:05:42.6969280Z inflating: build/bin/ProcessGroupNCCLErrorsTest 2025-12-04T09:05:42.8117384Z inflating: build/bin/test_api 2025-12-04T09:05:42.8118378Z inflating: build/bin/parallel_benchmark 2025-12-04T09:05:42.8122624Z inflating: build/bin/torch_shm_manager 2025-12-04T09:05:42.8123026Z creating: .additional_ci_files/ 2025-12-04T09:05:42.8186362Z inflating: .additional_ci_files/test-times.json 2025-12-04T09:05:42.8413913Z inflating: .additional_ci_files/test-class-times.json 2025-12-04T09:05:42.8440445Z ##[group]Run rm artifacts.zip 2025-12-04T09:05:42.8440768Z rm artifacts.zip 2025-12-04T09:05:42.8449641Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:42.8450023Z env: 2025-12-04T09:05:42.8450399Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:42.8450684Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:42.8451001Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:42.8451379Z ##[endgroup] 2025-12-04T09:05:42.9158566Z ##[group]Run df -H 2025-12-04T09:05:42.9158833Z df -H 2025-12-04T09:05:42.9164605Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:42.9165163Z env: 2025-12-04T09:05:42.9165399Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:42.9165681Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:42.9166025Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:42.9166417Z ##[endgroup] 2025-12-04T09:05:42.9210213Z Filesystem Size Used Avail Use% Mounted on 2025-12-04T09:05:42.9210774Z devtmpfs 4.2M 0 4.2M 0% /dev 2025-12-04T09:05:42.9211164Z tmpfs 101G 0 101G 0% /dev/shm 2025-12-04T09:05:42.9211567Z tmpfs 41G 693k 41G 1% /run 2025-12-04T09:05:42.9211997Z /dev/nvme0n1p1 161G 54G 108G 34% / 2025-12-04T09:05:42.9212348Z tmpfs 101G 17k 101G 1% /tmp 2025-12-04T09:05:42.9212848Z /dev/nvme0n1p128 11M 1.4M 9.2M 13% /boot/efi 2025-12-04T09:05:42.9213256Z tmpfs 21G 0 21G 0% /run/user/0 2025-12-04T09:05:42.9249802Z Prepare all required actions 2025-12-04T09:05:42.9250618Z Getting action download info 2025-12-04T09:05:43.0716848Z ##[group]Run ./.github/actions/download-td-artifacts 2025-12-04T09:05:43.0717219Z with: 2025-12-04T09:05:43.0717434Z env: 2025-12-04T09:05:43.0717807Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:43.0718101Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:43.0718474Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:43.0718854Z ##[endgroup] 2025-12-04T09:05:43.0749968Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T09:05:43.0750350Z with: 2025-12-04T09:05:43.0750585Z name: td_results 2025-12-04T09:05:43.0750856Z s3-bucket: gha-artifacts 2025-12-04T09:05:43.0751136Z region: us-east-1 2025-12-04T09:05:43.0751384Z env: 2025-12-04T09:05:43.0751620Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:43.0751901Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:43.0752249Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:43.0752706Z ##[endgroup] 2025-12-04T09:05:43.5513013Z (node:62886) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T09:05:43.5513630Z 2025-12-04T09:05:43.5513864Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T09:05:43.5514477Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T09:05:43.5515113Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T09:05:43.6614904Z Found 1 objects with prefix pytorch/pytorch/19922768520/td_results/ 2025-12-04T09:05:43.6615792Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json 2025-12-04T09:05:43.7194862Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json 2025-12-04T09:05:43.7198743Z Artifact download has finished successfully 2025-12-04T09:05:43.7366934Z ##[group]Run mkdir -p .additional_ci_files 2025-12-04T09:05:43.7367380Z mkdir -p .additional_ci_files 2025-12-04T09:05:43.7367877Z mv td_results.json .additional_ci_files/td_results.json || true 2025-12-04T09:05:43.7374455Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:43.7374883Z env: 2025-12-04T09:05:43.7375128Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:43.7375415Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:43.7375766Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:43.7376285Z ##[endgroup] 2025-12-04T09:05:43.7474464Z ##[group]Run .github/scripts/parse_ref.py 2025-12-04T09:05:43.7475065Z .github/scripts/parse_ref.py 2025-12-04T09:05:43.7480782Z shell: /usr/bin/bash -e {0} 2025-12-04T09:05:43.7481076Z env: 2025-12-04T09:05:43.7481322Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:43.7481625Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:43.7481967Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:43.7482377Z ##[endgroup] 2025-12-04T09:05:43.7697667Z Setting output branch=main 2025-12-04T09:05:43.7833904Z Prepare all required actions 2025-12-04T09:05:43.7834310Z Getting action download info 2025-12-04T09:05:43.9254504Z ##[group]Run ./.github/actions/filter-test-configs 2025-12-04T09:05:43.9255000Z with: 2025-12-04T09:05:43.9255402Z github-token: *** 2025-12-04T09:05:43.9267370Z test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T09:05:43.9278676Z job-name: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:43.9279299Z env: 2025-12-04T09:05:43.9279520Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:43.9279799Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:43.9280124Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:43.9280478Z ##[endgroup] 2025-12-04T09:05:43.9317176Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T09:05:43.9317469Z with: 2025-12-04T09:05:43.9317667Z shell: bash 2025-12-04T09:05:43.9317880Z timeout_minutes: 10 2025-12-04T09:05:43.9318109Z max_attempts: 5 2025-12-04T09:05:43.9318335Z retry_wait_seconds: 30 2025-12-04T09:05:43.9319291Z command: set -eux # PyYAML 6.0 doesn't work with MacOS x86 anymore # This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2 python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T09:05:43.9320236Z polling_interval_seconds: 1 2025-12-04T09:05:43.9320504Z warning_on_retry: true 2025-12-04T09:05:43.9320926Z continue_on_error: false 2025-12-04T09:05:43.9321349Z env: 2025-12-04T09:05:43.9321564Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:43.9322034Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:43.9322382Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:43.9322948Z GITHUB_TOKEN: *** 2025-12-04T09:05:43.9323205Z ##[endgroup] 2025-12-04T09:05:44.0324042Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T09:05:44.2817554Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T09:05:44.4039873Z Collecting requests==2.27.1 2025-12-04T09:05:44.4198358Z Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB) 2025-12-04T09:05:44.6072912Z Collecting pyyaml==6.0.2 2025-12-04T09:05:44.6115847Z Downloading PyYAML-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (737 kB) 2025-12-04T09:05:44.6358696Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (1.25.10) 2025-12-04T09:05:44.6367272Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (2.10) 2025-12-04T09:05:44.6892921Z Collecting certifi>=2017.4.17 2025-12-04T09:05:44.6926352Z Downloading certifi-2025.11.12-py3-none-any.whl (159 kB) 2025-12-04T09:05:45.1159359Z Collecting charset-normalizer~=2.0.0 2025-12-04T09:05:45.1198412Z Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB) 2025-12-04T09:05:45.2131104Z Installing collected packages: charset-normalizer, certifi, requests, pyyaml 2025-12-04T09:05:45.3413514Z Successfully installed certifi-2025.11.12 charset-normalizer-2.0.12 pyyaml-6.0.2 requests-2.27.1 2025-12-04T09:05:46.0152279Z Command completed after 1 attempt(s). 2025-12-04T09:05:46.0200831Z ##[group]Run set -x 2025-12-04T09:05:46.0201115Z set -x 2025-12-04T09:05:46.0201472Z  2025-12-04T09:05:46.0202510Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T09:05:46.0203049Z # in runner workspace 2025-12-04T09:05:46.0203461Z python3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py" 2025-12-04T09:05:46.0209745Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:46.0210137Z env: 2025-12-04T09:05:46.0210352Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:46.0210630Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:46.0210960Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:46.0211316Z ##[endgroup] 2025-12-04T09:05:46.0239108Z + python3 /home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py 2025-12-04T09:05:46.0429160Z Setting output branch=main 2025-12-04T09:05:46.0483583Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T09:05:46.0484069Z echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T09:05:46.0484594Z echo "Job name: ${JOB_NAME}" 2025-12-04T09:05:46.0484925Z  2025-12-04T09:05:46.0485346Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T09:05:46.0485882Z # in runner workspace 2025-12-04T09:05:46.0486342Z python3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \ 2025-12-04T09:05:46.0486875Z  --workflow "${GITHUB_WORKFLOW}" \ 2025-12-04T09:05:46.0487247Z  --job-name "${JOB_NAME}" \ 2025-12-04T09:05:46.0498844Z  --test-matrix "{"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}" \ 2025-12-04T09:05:46.0510393Z  --selected-test-configs "" \ 2025-12-04T09:05:46.0510746Z  --pr-number "${PR_NUMBER}" \ 2025-12-04T09:05:46.0511079Z  --tag "${TAG}" \ 2025-12-04T09:05:46.0511383Z  --event-name "${EVENT_NAME}" \ 2025-12-04T09:05:46.0511723Z  --schedule "${SCHEDULE}" \ 2025-12-04T09:05:46.0512040Z  --branch "${HEAD_BRANCH}" 2025-12-04T09:05:46.0517478Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:46.0517881Z env: 2025-12-04T09:05:46.0518094Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:46.0518372Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:46.0518696Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:46.0519283Z GITHUB_TOKEN: *** 2025-12-04T09:05:46.0519850Z JOB_NAME: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:46.0520487Z PR_NUMBER: 2025-12-04T09:05:46.0520712Z TAG: 2025-12-04T09:05:46.0521277Z EVENT_NAME: schedule 2025-12-04T09:05:46.0521549Z SCHEDULE: 29 8 * * * 2025-12-04T09:05:46.0521990Z HEAD_BRANCH: main 2025-12-04T09:05:46.0522243Z ##[endgroup] 2025-12-04T09:05:46.0546114Z Workflow: trunk 2025-12-04T09:05:46.0547041Z Job name: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:46.2471347Z Setting output keep-going=True 2025-12-04T09:05:46.2471788Z Setting output ci-verbose-test-logs=False 2025-12-04T09:05:46.2472416Z Setting output ci-test-showlocals=False 2025-12-04T09:05:46.2472811Z Setting output ci-no-test-timeout=False 2025-12-04T09:05:46.2473181Z Setting output ci-no-td=False 2025-12-04T09:05:46.2473529Z Setting output ci-td-distributed=False 2025-12-04T09:05:46.2473884Z Setting output is-unstable=False 2025-12-04T09:05:46.2474234Z Setting output reenabled-issues= 2025-12-04T09:05:46.2500992Z Setting output test-matrix={"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T09:05:46.2527207Z Setting output is-test-matrix-empty=False 2025-12-04T09:05:46.2599100Z ##[group]Run echo "Filtered matrix:" 2025-12-04T09:05:46.2599556Z echo "Filtered matrix:" 2025-12-04T09:05:46.2625757Z echo "{"include": [{"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 5, "runner": "lf.linux.g6.4xlarge.experimental.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 1, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 2, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "distributed", "shard": 3, "num_shards": 3, "runner": "lf.linux.g4dn.12xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "pr_time_benchmarks", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "libtorch_agnostic_targetting", "shard": 1, "num_shards": 1, "runner": "linux.g4dn.metal.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests"}]}" 2025-12-04T09:05:46.2649991Z  2025-12-04T09:05:46.2650208Z echo 2025-12-04T09:05:46.2650493Z echo "Is the current job unstable? False" 2025-12-04T09:05:46.2650830Z  2025-12-04T09:05:46.2651040Z echo 2025-12-04T09:05:46.2651302Z echo "Is keep-going label set? True" 2025-12-04T09:05:46.2651621Z  2025-12-04T09:05:46.2651828Z echo 2025-12-04T09:05:46.2652072Z echo "Reenabled issues? " 2025-12-04T09:05:46.2657980Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:46.2658418Z env: 2025-12-04T09:05:46.2658668Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:46.2658972Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:46.2659340Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:46.2659753Z ##[endgroup] 2025-12-04T09:05:46.2683840Z Filtered matrix: 2025-12-04T09:05:46.2716274Z {include: [{config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 5, runner: lf.linux.g6.4xlarge.experimental.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 1, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 2, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: distributed, shard: 3, num_shards: 3, runner: lf.linux.g4dn.12xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: pr_time_benchmarks, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: libtorch_agnostic_targetting, shard: 1, num_shards: 1, runner: linux.g4dn.metal.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests}]} 2025-12-04T09:05:46.2743193Z 2025-12-04T09:05:46.2743342Z Is the current job unstable? False 2025-12-04T09:05:46.2743577Z 2025-12-04T09:05:46.2743716Z Is keep-going label set? True 2025-12-04T09:05:46.2743938Z 2025-12-04T09:05:46.2744046Z Reenabled issues? 2025-12-04T09:05:46.2774990Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T09:05:46.2775712Z echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T09:05:46.2781892Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:46.2782324Z env: 2025-12-04T09:05:46.2782574Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:46.2782886Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:46.2783251Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:46.2783654Z JOB_TIMEOUT: 600 2025-12-04T09:05:46.2783917Z ##[endgroup] 2025-12-04T09:05:46.2832430Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:05:46.2833130Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:05:46.2833639Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:05:46.2839338Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:05:46.2839749Z env: 2025-12-04T09:05:46.2839992Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:46.2840295Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:46.2840637Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:46.2841038Z ##[endgroup] 2025-12-04T09:05:46.2937940Z ##[group]Run set -x 2025-12-04T09:05:46.2938319Z set -x 2025-12-04T09:05:46.2938581Z  2025-12-04T09:05:46.2938872Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2025-12-04T09:05:46.2939320Z  TEST_COMMAND=.ci/pytorch/multigpu-test.sh 2025-12-04T09:05:46.2939789Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2025-12-04T09:05:46.2940211Z  TEST_COMMAND=.ci/onnx/test.sh 2025-12-04T09:05:46.2940551Z else 2025-12-04T09:05:46.2940843Z  TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T09:05:46.2941202Z fi 2025-12-04T09:05:46.2941439Z  2025-12-04T09:05:46.2941743Z # Leaving 1GB for the runner and other things 2025-12-04T09:05:46.2942428Z TOTAL_AVAILABLE_MEMORY_IN_GB=$(awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo) 2025-12-04T09:05:46.2943460Z # https://docs.docker.com/engine/containers/resource_constraints/#--memory-swap-details, the 3GB swap 2025-12-04T09:05:46.2944286Z # comes from https://github.com/pytorch/test-infra/pull/6058 2025-12-04T09:05:46.2944913Z TOTAL_MEMORY_WITH_SWAP=$(("${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}" + 3)) 2025-12-04T09:05:46.2945403Z  2025-12-04T09:05:46.2945698Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-12-04T09:05:46.2946103Z  SHM_OPTS= 2025-12-04T09:05:46.2946390Z  JENKINS_USER= 2025-12-04T09:05:46.2946801Z  # ensure that docker container cleanly exits in 12 hours 2025-12-04T09:05:46.2947351Z  # if for some reason cleanup action doesn't stop container 2025-12-04T09:05:46.2947942Z  # when job is cancelled 2025-12-04T09:05:46.2948301Z  DOCKER_SHELL_CMD="sleep 12h" 2025-12-04T09:05:46.2948803Z  USED_IMAGE="${DOCKER_IMAGE_S390X}" 2025-12-04T09:05:46.2949131Z else 2025-12-04T09:05:46.2949391Z  SHM_OPTS="--shm-size=${SHM_SIZE}" 2025-12-04T09:05:46.2949733Z  JENKINS_USER="--user jenkins" 2025-12-04T09:05:46.2950060Z  DOCKER_SHELL_CMD= 2025-12-04T09:05:46.2950356Z  USED_IMAGE="${DOCKER_IMAGE}" 2025-12-04T09:05:46.2950662Z fi 2025-12-04T09:05:46.2950862Z  2025-12-04T09:05:46.2951214Z # detached container should get cleaned up by teardown_ec2_linux 2025-12-04T09:05:46.2951777Z # TODO: Stop building test binaries as part of the build phase 2025-12-04T09:05:46.2952404Z # Used for GPU_FLAG, SHM_OPTS, JENKINS_USER and DOCKER_SHELL_CMD since that doesn't play nice 2025-12-04T09:05:46.2952970Z # shellcheck disable=SC2086,SC2090 2025-12-04T09:05:46.2953324Z container_name=$(docker run \ 2025-12-04T09:05:46.2953647Z  ${GPU_FLAG:-} \ 2025-12-04T09:05:46.2953952Z  ${SCCACHE_SERVER_PORT_DOCKER_FLAG:-} \ 2025-12-04T09:05:46.2954311Z  -e BUILD_ENVIRONMENT \ 2025-12-04T09:05:46.2954618Z  -e PR_NUMBER \ 2025-12-04T09:05:46.2954895Z  -e GITHUB_ACTIONS \ 2025-12-04T09:05:46.2955195Z  -e GITHUB_REPOSITORY \ 2025-12-04T09:05:46.2955507Z  -e GITHUB_WORKFLOW \ 2025-12-04T09:05:46.2955791Z  -e GITHUB_JOB \ 2025-12-04T09:05:46.2956073Z  -e GITHUB_RUN_ID \ 2025-12-04T09:05:46.2956364Z  -e GITHUB_RUN_NUMBER \ 2025-12-04T09:05:46.2956676Z  -e GITHUB_RUN_ATTEMPT \ 2025-12-04T09:05:46.2956967Z  -e JOB_ID \ 2025-12-04T09:05:46.2957231Z  -e JOB_NAME \ 2025-12-04T09:05:46.2957499Z  -e BASE_SHA \ 2025-12-04T09:05:46.2957751Z  -e BRANCH \ 2025-12-04T09:05:46.2958009Z  -e SHA1 \ 2025-12-04T09:05:46.2958272Z  -e AWS_DEFAULT_REGION \ 2025-12-04T09:05:46.2958566Z  -e IN_WHEEL_TEST \ 2025-12-04T09:05:46.2958854Z  -e SHARD_NUMBER \ 2025-12-04T09:05:46.2959138Z  -e TEST_CONFIG \ 2025-12-04T09:05:46.2959413Z  -e NUM_TEST_SHARDS \ 2025-12-04T09:05:46.2959806Z  -e REENABLED_ISSUES \ 2025-12-04T09:05:46.2960131Z  -e CONTINUE_THROUGH_ERROR \ 2025-12-04T09:05:46.2960459Z  -e VERBOSE_TEST_LOGS \ 2025-12-04T09:05:46.2960756Z  -e TEST_SHOWLOCALS \ 2025-12-04T09:05:46.2961059Z  -e NO_TEST_TIMEOUT \ 2025-12-04T09:05:46.2961352Z  -e NO_TD \ 2025-12-04T09:05:46.2961607Z  -e TD_DISTRIBUTED \ 2025-12-04T09:05:46.2961902Z  -e PR_LABELS \ 2025-12-04T09:05:46.2962209Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2025-12-04T09:05:46.2962547Z  -e SCCACHE_BUCKET \ 2025-12-04T09:05:46.2962842Z  -e SCCACHE_REGION \ 2025-12-04T09:05:46.2963133Z  -e XLA_CUDA \ 2025-12-04T09:05:46.2963421Z  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ 2025-12-04T09:05:46.2963796Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2025-12-04T09:05:46.2964183Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2025-12-04T09:05:46.2964572Z  -e SKIP_SCCACHE_INITIALIZATION=1 \ 2025-12-04T09:05:46.2964917Z  -e HUGGING_FACE_HUB_TOKEN \ 2025-12-04T09:05:46.2965259Z  -e VLLM_TEST_HUGGING_FACE_TOKEN \ 2025-12-04T09:05:46.2965619Z  -e SCRIBE_GRAPHQL_ACCESS_TOKEN \ 2025-12-04T09:05:46.2965943Z  -e DASHBOARD_TAG \ 2025-12-04T09:05:46.2966243Z  -e ARTIFACTS_FILE_SUFFIX \ 2025-12-04T09:05:46.2966618Z  --memory="${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}g" \ 2025-12-04T09:05:46.2978205Z  --memory-swap="${TOTAL_MEMORY_WITH_SWAP}g" \ 2025-12-04T09:05:46.2978798Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2025-12-04T09:05:46.2979279Z  --security-opt seccomp=unconfined \ 2025-12-04T09:05:46.2979805Z  --cap-add=SYS_PTRACE \ 2025-12-04T09:05:46.2980150Z  --ipc=host \ 2025-12-04T09:05:46.2980451Z  ${SHM_OPTS} \ 2025-12-04T09:05:46.2980732Z  --tty \ 2025-12-04T09:05:46.2981008Z  --detach \ 2025-12-04T09:05:46.2981324Z  --name="${container_name}" \ 2025-12-04T09:05:46.2981677Z  ${JENKINS_USER} \ 2025-12-04T09:05:46.2982081Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2025-12-04T09:05:46.2982549Z  -w /var/lib/jenkins/workspace \ 2025-12-04T09:05:46.2982921Z  "${USED_IMAGE}" \ 2025-12-04T09:05:46.2983227Z  ${DOCKER_SHELL_CMD} 2025-12-04T09:05:46.2983535Z ) 2025-12-04T09:05:46.2983925Z echo "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}" 2025-12-04T09:05:46.2984393Z  2025-12-04T09:05:46.2984701Z if [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then 2025-12-04T09:05:46.2985393Z  docker exec -t "${container_name}" sh -c "python3 -m pip install -r .ci/docker/requirements-ci.txt" 2025-12-04T09:05:46.2985992Z fi 2025-12-04T09:05:46.2986231Z  2025-12-04T09:05:46.2986812Z docker exec -t "${container_name}" sh -c "python3 -m pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}" 2025-12-04T09:05:46.2992612Z shell: /usr/bin/bash -e {0} 2025-12-04T09:05:46.2992876Z env: 2025-12-04T09:05:46.2993102Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:05:46.2993380Z HAS_NVIDIA_GPU: true 2025-12-04T09:05:46.2993691Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:05:46.2994139Z BUILD_ENVIRONMENT: linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T09:05:46.2994508Z PR_NUMBER: 2025-12-04T09:05:46.2994743Z GITHUB_REPOSITORY: pytorch/pytorch 2025-12-04T09:05:46.2995063Z GITHUB_WORKFLOW: trunk 2025-12-04T09:05:46.2995331Z GITHUB_JOB: test 2025-12-04T09:05:46.2995565Z GITHUB_RUN_ID: 19922768520 2025-12-04T09:05:46.2995857Z GITHUB_RUN_NUMBER: 158165 2025-12-04T09:05:46.2996139Z GITHUB_RUN_ATTEMPT: 1 2025-12-04T09:05:46.2996394Z JOB_ID: 57116084904 2025-12-04T09:05:46.2996965Z JOB_NAME: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:05:46.2997696Z BRANCH: main 2025-12-04T09:05:46.2997972Z SHA1: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:05:46.2998361Z BASE_SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:05:46.2998727Z TEST_CONFIG: distributed 2025-12-04T09:05:46.2998998Z SHARD_NUMBER: 3 2025-12-04T09:05:46.2999227Z NUM_TEST_SHARDS: 3 2025-12-04T09:05:46.2999476Z EXTRA_FLAGS: 2025-12-04T09:05:46.2999711Z OP_BENCHMARK_TESTS: 2025-12-04T09:05:46.2999969Z REENABLED_ISSUES: 2025-12-04T09:05:46.3000217Z CONTINUE_THROUGH_ERROR: True 2025-12-04T09:05:46.3000507Z VERBOSE_TEST_LOGS: False 2025-12-04T09:05:46.3000783Z TEST_SHOWLOCALS: False 2025-12-04T09:05:46.3001038Z NO_TEST_TIMEOUT: False 2025-12-04T09:05:46.3001297Z NO_TD: False 2025-12-04T09:05:46.3001536Z TD_DISTRIBUTED: False 2025-12-04T09:05:46.3001846Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2025-12-04T09:05:46.3002218Z SCCACHE_REGION: us-east-1 2025-12-04T09:05:46.3002489Z SHM_SIZE: 2g 2025-12-04T09:05:46.3003295Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:05:46.3004770Z DOCKER_IMAGE_S390X: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:05:46.3005672Z XLA_CUDA: 2025-12-04T09:05:46.3006037Z XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:05:46.3006499Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 1 2025-12-04T09:05:46.3006831Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2025-12-04T09:05:46.3007145Z DASHBOARD_TAG: 2025-12-04T09:05:46.3007661Z VLLM_TEST_HUGGING_FACE_TOKEN: *** 2025-12-04T09:05:46.3008084Z HUGGING_FACE_HUB_TOKEN: *** 2025-12-04T09:05:46.3008501Z SCRIBE_GRAPHQL_ACCESS_TOKEN: *** 2025-12-04T09:05:46.3009043Z ARTIFACTS_FILE_SUFFIX: test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904 2025-12-04T09:05:46.3009597Z ##[endgroup] 2025-12-04T09:05:46.3032391Z + [[ distributed == \m\u\l\t\i\g\p\u ]] 2025-12-04T09:05:46.3032950Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *onnx* ]] 2025-12-04T09:05:46.3033406Z + TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T09:05:46.3036972Z ++ awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo 2025-12-04T09:05:46.3056594Z + TOTAL_AVAILABLE_MEMORY_IN_GB='185.682 ' 2025-12-04T09:05:46.3057140Z + TOTAL_MEMORY_WITH_SWAP=188 2025-12-04T09:05:46.3057603Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *\s\3\9\0\x* ]] 2025-12-04T09:05:46.3058046Z + SHM_OPTS=--shm-size=2g 2025-12-04T09:05:46.3058346Z + JENKINS_USER='--user jenkins' 2025-12-04T09:05:46.3058671Z + DOCKER_SHELL_CMD= 2025-12-04T09:05:46.3059600Z + USED_IMAGE=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:05:46.3065617Z +++ nproc --ignore=2 2025-12-04T09:05:46.3098112Z ++ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e TD_DISTRIBUTED -e PR_LABELS -e MAX_JOBS=46 -e SCCACHE_BUCKET -e SCCACHE_REGION -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e SKIP_SCCACHE_INITIALIZATION=1 -e HUGGING_FACE_HUB_TOKEN -e VLLM_TEST_HUGGING_FACE_TOKEN -e SCRIBE_GRAPHQL_ACCESS_TOKEN -e DASHBOARD_TAG -e ARTIFACTS_FILE_SUFFIX --memory=185g --memory-swap=188g --env-file=/tmp/github_env_19922768520 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:05:59.4075306Z + container_name=9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T09:05:59.4076212Z + echo DOCKER_CONTAINER_ID=9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T09:05:59.4076891Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *\s\3\9\0\x* ]] 2025-12-04T09:05:59.4079447Z ++ echo dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl 2025-12-04T09:05:59.4081852Z + docker exec -t 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 sh -c 'python3 -m pip install dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl[opt-einsum] && .ci/pytorch/test.sh' 2025-12-04T09:05:59.9165181Z Processing ./dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl (from torch==2.10.0a0+gitffd9b0f) 2025-12-04T09:06:01.0220301Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.18.0) 2025-12-04T09:06:01.0223228Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (4.12.2) 2025-12-04T09:06:01.0228214Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.13.3) 2025-12-04T09:06:01.0233449Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2.8.8) 2025-12-04T09:06:01.0237527Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.1.6) 2025-12-04T09:06:01.0242370Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2025.10.0) 2025-12-04T09:06:01.0259003Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.3.0) 2025-12-04T09:06:01.0666246Z Requirement already satisfied: numpy>=1.7 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.22.4) 2025-12-04T09:06:01.0690745Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.3.0) 2025-12-04T09:06:01.0755663Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.0.3) 2025-12-04T09:06:01.5124548Z Installing collected packages: torch 2025-12-04T09:06:14.8924705Z Successfully installed torch-2.10.0a0+gitffd9b0f 2025-12-04T09:06:14.9522499Z + export TERM=vt100 2025-12-04T09:06:14.9522851Z + TERM=vt100 2025-12-04T09:06:14.9523119Z ++ dirname .ci/pytorch/test.sh 2025-12-04T09:06:14.9530482Z + source .ci/pytorch/common.sh 2025-12-04T09:06:14.9533754Z +++ dirname .ci/pytorch/common.sh 2025-12-04T09:06:14.9540958Z ++ source .ci/pytorch/common_utils.sh 2025-12-04T09:06:14.9542302Z +++ declare -f -t trap_add 2025-12-04T09:06:14.9548198Z ++ set -ex -o pipefail 2025-12-04T09:06:14.9548665Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc11 == *rocm* ]] 2025-12-04T09:06:14.9549054Z ++ BUILD_TEST_LIBTORCH=0 2025-12-04T09:06:14.9553339Z ++ dirname .ci/pytorch/test.sh 2025-12-04T09:06:14.9559754Z + source .ci/pytorch/common-build.sh 2025-12-04T09:06:14.9561452Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc11 != *win-* ]] 2025-12-04T09:06:14.9568526Z ++++ dirname .ci/pytorch/common-build.sh 2025-12-04T09:06:14.9577140Z +++ cd .ci/pytorch 2025-12-04T09:06:14.9577482Z +++ pwd -P 2025-12-04T09:06:14.9578189Z ++ script_dir=/var/lib/jenkins/workspace/.ci/pytorch 2025-12-04T09:06:14.9578683Z ++ [[ linux-jammy-cuda12.8-py3.10-gcc11 == *-pch* ]] 2025-12-04T09:06:14.9579082Z ++ which sccache 2025-12-04T09:06:14.9593773Z ++ [[ -z ossci-compiler-cache-circleci-v2 ]] 2025-12-04T09:06:14.9594218Z ++ sccache --stop-server 2025-12-04T09:06:14.9618412Z ++ true 2025-12-04T09:06:14.9618742Z ++ rm -f /var/lib/jenkins/sccache_error.log 2025-12-04T09:06:14.9632691Z ++ trap_add sccache_epilogue EXIT 2025-12-04T09:06:14.9633110Z ++ trap_add_cmd=sccache_epilogue 2025-12-04T09:06:14.9633525Z ++ shift 2025-12-04T09:06:14.9633775Z ++ for trap_add_name in "$@" 2025-12-04T09:06:14.9636504Z ++++ trap -p EXIT 2025-12-04T09:06:14.9638739Z +++ eval 'extract_trap_cmd ' 2025-12-04T09:06:14.9639051Z ++++ extract_trap_cmd 2025-12-04T09:06:14.9639318Z ++++ printf '%s\n' '' 2025-12-04T09:06:14.9639611Z +++ printf '%s\n' sccache_epilogue 2025-12-04T09:06:14.9641036Z ++ trap -- ' 2025-12-04T09:06:14.9641278Z sccache_epilogue' EXIT 2025-12-04T09:06:14.9641554Z ++ [[ -n 1 ]] 2025-12-04T09:06:14.9641989Z ++ echo 'Skipping sccache server initialization, setting environment variables' 2025-12-04T09:06:14.9642683Z Skipping sccache server initialization, setting environment variables 2025-12-04T09:06:14.9643177Z ++ export SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:06:14.9643504Z ++ SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:06:14.9643901Z ++ export SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:06:14.9644393Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:06:14.9651195Z ++ export RUST_LOG=sccache::server=error 2025-12-04T09:06:14.9651807Z ++ RUST_LOG=sccache::server=error 2025-12-04T09:06:14.9652135Z ++ sccache --zero-stats 2025-12-04T09:06:15.0857651Z Statistics zeroed. 2025-12-04T09:06:15.0859769Z ++ which ccache 2025-12-04T09:06:15.0872662Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *rocm* ]] 2025-12-04T09:06:15.0873179Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *s390x* ]] 2025-12-04T09:06:15.0873601Z + [[ -d /var/lib/jenkins/workspace ]] 2025-12-04T09:06:15.0874234Z ++ stat -c %u /var/lib/jenkins/workspace 2025-12-04T09:06:15.0890188Z + WORKSPACE_ORIGINAL_OWNER_ID=1000 2025-12-04T09:06:15.0890591Z + trap_add cleanup_workspace EXIT 2025-12-04T09:06:15.0890953Z + trap_add_cmd=cleanup_workspace 2025-12-04T09:06:15.0891255Z + shift 2025-12-04T09:06:15.0891502Z + for trap_add_name in "$@" 2025-12-04T09:06:15.0897665Z +++ trap -p EXIT 2025-12-04T09:06:15.0899965Z ++ eval 'extract_trap_cmd trap -- '\'' 2025-12-04T09:06:15.0900327Z sccache_epilogue'\'' EXIT' 2025-12-04T09:06:15.0900661Z +++ extract_trap_cmd trap -- ' 2025-12-04T09:06:15.0901005Z sccache_epilogue' EXIT 2025-12-04T09:06:15.0901277Z +++ printf '%s\n' ' 2025-12-04T09:06:15.0901548Z sccache_epilogue' 2025-12-04T09:06:15.0901837Z ++ printf '%s\n' cleanup_workspace 2025-12-04T09:06:15.0902528Z + trap -- ' 2025-12-04T09:06:15.0902771Z sccache_epilogue 2025-12-04T09:06:15.0903044Z cleanup_workspace' EXIT 2025-12-04T09:06:15.0903390Z + sudo chown -R jenkins /var/lib/jenkins/workspace 2025-12-04T09:06:15.7399571Z + git config --global --add safe.directory /var/lib/jenkins/workspace 2025-12-04T09:06:15.7418120Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:06:15.7419382Z ++ python -c 'import os;import numba.cuda; print(os.path.dirname(numba.cuda.__file__))' 2025-12-04T09:06:16.1808037Z + NUMBA_CUDA_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:06:16.1809308Z + '[' -n /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ']' 2025-12-04T09:06:16.1810188Z +++ realpath .ci/pytorch/test.sh 2025-12-04T09:06:16.1818299Z ++ dirname /var/lib/jenkins/workspace/.ci/pytorch/test.sh 2025-12-04T09:06:16.1826378Z + NUMBA_PATCH=/var/lib/jenkins/workspace/.ci/pytorch/numba-cuda-13.patch 2025-12-04T09:06:16.1827574Z + pushd /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:06:16.1829124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ~/workspace 2025-12-04T09:06:16.1829744Z + patch -p4 2025-12-04T09:06:16.1843200Z patching file cudadrv/driver.py 2025-12-04T09:06:16.1843667Z Hunk #1 succeeded at 357 (offset -8 lines). 2025-12-04T09:06:16.1852585Z + popd 2025-12-04T09:06:16.1853071Z ~/workspace 2025-12-04T09:06:16.1853488Z + echo 'Environment variables:' 2025-12-04T09:06:16.1854106Z Environment variables: 2025-12-04T09:06:16.1854520Z + env 2025-12-04T09:06:16.1862992Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:06:16.1863545Z CONTINUE_THROUGH_ERROR=True 2025-12-04T09:06:16.1863920Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T09:06:16.1864653Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-12-04T09:06:16.1865043Z HOSTNAME=9f53f9c599eb 2025-12-04T09:06:16.1866231Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e 2025-12-04T09:06:16.1867428Z GITHUB_ACTION=__run_3 2025-12-04T09:06:16.1867758Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:06:16.1868114Z GITHUB_RUN_NUMBER=158165 2025-12-04T09:06:16.1868530Z TEST_CONFIG=distributed 2025-12-04T09:06:16.1868837Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T09:06:16.1869209Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-12-04T09:06:16.1869571Z SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:06:16.1870012Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-12-04T09:06:16.1870347Z GITHUB_TRIGGERING_ACTOR=huydhn 2025-12-04T09:06:16.1870665Z GITHUB_REF_TYPE=branch 2025-12-04T09:06:16.1870983Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:16.1871361Z XLA_CUDA= 2025-12-04T09:06:16.1871615Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-12-04T09:06:16.1872300Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T09:06:16.1872715Z *** 2025-12-04T09:06:16.1872959Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T09:06:16.1873271Z GITHUB_ACTIONS=true 2025-12-04T09:06:16.1873539Z NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:06:16.1873921Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:06:16.1874370Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:16.1874787Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:16.1875373Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/heads/main 2025-12-04T09:06:16.1875904Z UCC_HOME=/usr 2025-12-04T09:06:16.1876157Z VERBOSE_TEST_LOGS=False 2025-12-04T09:06:16.1876437Z GITHUB_REF=refs/heads/main 2025-12-04T09:06:16.1876726Z SHARD_NUMBER=3 2025-12-04T09:06:16.1876991Z GITHUB_REF_PROTECTED=true 2025-12-04T09:06:16.1877271Z HOME=/var/lib/jenkins 2025-12-04T09:06:16.1877580Z GITHUB_API_URL=https://api.github.com 2025-12-04T09:06:16.1877950Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T09:06:16.1878330Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-12-04T09:06:16.1878713Z USE_SYSTEM_NCCL=1 2025-12-04T09:06:16.1878968Z NUM_TEST_SHARDS=3 2025-12-04T09:06:16.1879204Z UCX_HOME=/usr 2025-12-04T09:06:16.1879856Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e 2025-12-04T09:06:16.1880957Z JOB_NAME=linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:06:16.1882018Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e 2025-12-04T09:06:16.1882936Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2025-12-04T09:06:16.1883508Z GITHUB_EVENT_NAME=schedule 2025-12-04T09:06:16.1883805Z DASHBOARD_TAG= 2025-12-04T09:06:16.1884094Z GITHUB_RUN_ID=19922768520 2025-12-04T09:06:16.1884372Z INSTALLED_OPENBLAS= 2025-12-04T09:06:16.1885085Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e 2025-12-04T09:06:16.1885869Z GITHUB_ACTOR=huydhn 2025-12-04T09:06:16.1886117Z PR_NUMBER= 2025-12-04T09:06:16.1886351Z DESIRED_CUDA=12.8.1 2025-12-04T09:06:16.1886612Z GITHUB_RUN_ATTEMPT=1 2025-12-04T09:06:16.1886989Z ANACONDA_PYTHON_VERSION=3.10 2025-12-04T09:06:16.1887377Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T09:06:16.1887770Z TERM=vt100 2025-12-04T09:06:16.1887997Z INSTALLED_VISION=yes 2025-12-04T09:06:16.1888266Z BRANCH=main 2025-12-04T09:06:16.1888516Z SCCACHE_REGION=us-east-1 2025-12-04T09:06:16.1888810Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T09:06:16.1889128Z BUILD_AOT_INDUCTOR_TEST= 2025-12-04T09:06:16.1889423Z CUDA_PATH=/usr/local/cuda 2025-12-04T09:06:16.1890003Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-12-04T09:06:16.1890671Z GITHUB_SERVER_URL=https://github.com 2025-12-04T09:06:16.1891108Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-12-04T09:06:16.1891499Z REENABLED_ISSUES= 2025-12-04T09:06:16.1891735Z DOCS= 2025-12-04T09:06:16.1891952Z SHLVL=1 2025-12-04T09:06:16.1892171Z MAX_JOBS=46 2025-12-04T09:06:16.1892397Z GITHUB_ACTOR_ID=475357 2025-12-04T09:06:16.1892781Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:16.1893217Z GITHUB_REF_NAME=main 2025-12-04T09:06:16.1893628Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:06:16.1894110Z GITHUB_JOB=test 2025-12-04T09:06:16.1894364Z NO_TEST_TIMEOUT=False 2025-12-04T09:06:16.1894629Z TD_DISTRIBUTED=False 2025-12-04T09:06:16.1894922Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T09:06:16.1895260Z GITHUB_RETENTION_DAYS=90 2025-12-04T09:06:16.1895547Z OPENSSL_DIR=/opt/openssl 2025-12-04T09:06:16.1895848Z GITHUB_ACTION_REPOSITORY= 2025-12-04T09:06:16.1897025Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:06:16.1898078Z GITHUB_BASE_REF= 2025-12-04T09:06:16.1898326Z INSTALLED_ACL= 2025-12-04T09:06:16.1898851Z ARTIFACTS_FILE_SUFFIX=test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904 2025-12-04T09:06:16.1899453Z CI=true 2025-12-04T09:06:16.1899699Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T09:06:16.1900071Z RUST_LOG=sccache::server=error 2025-12-04T09:06:16.1900384Z JOB_ID=57116084904 2025-12-04T09:06:16.1900632Z GITHUB_HEAD_REF= 2025-12-04T09:06:16.1900891Z GITHUB_ACTION_REF= 2025-12-04T09:06:16.1901222Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-12-04T09:06:16.1901630Z TEST_SHOWLOCALS=False 2025-12-04T09:06:16.1901918Z GITHUB_WORKFLOW=trunk 2025-12-04T09:06:16.1902218Z DEBIAN_FRONTEND=noninteractive 2025-12-04T09:06:16.1902966Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e 2025-12-04T09:06:16.1903717Z NO_TD=False 2025-12-04T09:06:16.1903982Z SKIP_SCCACHE_INITIALIZATION=1 2025-12-04T09:06:16.1904344Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-12-04T09:06:16.1904692Z _=/usr/bin/env 2025-12-04T09:06:16.1905107Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:06:16.1905720Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2025-12-04T09:06:16.2004428Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch 2025-12-04T09:06:16.2005287Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin 2025-12-04T09:06:16.2005968Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib 2025-12-04T09:06:16.2006650Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test 2025-12-04T09:06:16.2007159Z + BUILD_DIR=build 2025-12-04T09:06:16.2007443Z + BUILD_RENAMED_DIR=build_renamed 2025-12-04T09:06:16.2007783Z + BUILD_BIN_DIR=build/bin 2025-12-04T09:06:16.2008063Z + SHARD_NUMBER=3 2025-12-04T09:06:16.2008325Z + NUM_TEST_SHARDS=3 2025-12-04T09:06:16.2008621Z + export TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:06:16.2008975Z + TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:06:16.2009271Z + export VALGRIND=ON 2025-12-04T09:06:16.2009535Z + VALGRIND=ON 2025-12-04T09:06:16.2009846Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *clang9* ]] 2025-12-04T09:06:16.2010455Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *xpu* ]] 2025-12-04T09:06:16.2010846Z + detect_cuda_arch 2025-12-04T09:06:16.2011161Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:06:16.2011543Z + command -v nvidia-smi 2025-12-04T09:06:16.2011827Z /usr/bin/nvidia-smi 2025-12-04T09:06:16.2012159Z ++ nvidia-smi --query-gpu=compute_cap --format=csv 2025-12-04T09:06:16.2012540Z ++ tail -n 1 2025-12-04T09:06:16.2502568Z + TORCH_CUDA_ARCH_LIST=7.5 2025-12-04T09:06:16.2503160Z + export TORCH_CUDA_ARCH_LIST 2025-12-04T09:06:16.2503788Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *s390x* ]] 2025-12-04T09:06:16.2504366Z + [[ 0 == \1 ]] 2025-12-04T09:06:16.2504623Z + [[ True == \1 ]] 2025-12-04T09:06:16.2504970Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *bazel* ]] 2025-12-04T09:06:16.2506785Z ++ realpath build/custom_test_artifacts 2025-12-04T09:06:16.2514603Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts 2025-12-04T09:06:16.2515594Z + [[ -n '' ]] 2025-12-04T09:06:16.2516003Z + echo 'Environment variables' 2025-12-04T09:06:16.2516311Z Environment variables 2025-12-04T09:06:16.2516577Z + env 2025-12-04T09:06:16.2523281Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T09:06:16.2523786Z CONTINUE_THROUGH_ERROR=True 2025-12-04T09:06:16.2524175Z BUILD_ENVIRONMENT=linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T09:06:16.2524873Z VLLM_TEST_HUGGING_FACE_TOKEN=*** 2025-12-04T09:06:16.2525270Z HOSTNAME=9f53f9c599eb 2025-12-04T09:06:16.2526460Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e 2025-12-04T09:06:16.2527839Z GITHUB_ACTION=__run_3 2025-12-04T09:06:16.2528670Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:06:16.2529306Z GITHUB_RUN_NUMBER=158165 2025-12-04T09:06:16.2529852Z TEST_CONFIG=distributed 2025-12-04T09:06:16.2530405Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T09:06:16.2531064Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all 2025-12-04T09:06:16.2531745Z SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:06:16.2532627Z SCRIBE_GRAPHQL_ACCESS_TOKEN=*** 2025-12-04T09:06:16.2533386Z GITHUB_TRIGGERING_ACTOR=huydhn 2025-12-04T09:06:16.2533974Z GITHUB_REF_TYPE=branch 2025-12-04T09:06:16.2534449Z TORCH_CUDA_ARCH_LIST=7.5 2025-12-04T09:06:16.2535092Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:16.2535779Z XLA_CUDA= 2025-12-04T09:06:16.2536281Z NCCL_LIB_DIR=/usr/local/cuda/lib64/ 2025-12-04T09:06:16.2537579Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T09:06:16.2538213Z *** 2025-12-04T09:06:16.2538466Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T09:06:16.2538779Z GITHUB_ACTIONS=true 2025-12-04T09:06:16.2539071Z NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T09:06:16.2539480Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:06:16.2539930Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:16.2540372Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:16.2540977Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/heads/main 2025-12-04T09:06:16.2541522Z UCC_HOME=/usr 2025-12-04T09:06:16.2541790Z TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:06:16.2542106Z VERBOSE_TEST_LOGS=False 2025-12-04T09:06:16.2542391Z GITHUB_REF=refs/heads/main 2025-12-04T09:06:16.2542688Z SHARD_NUMBER=3 2025-12-04T09:06:16.2542954Z GITHUB_REF_PROTECTED=true 2025-12-04T09:06:16.2543250Z HOME=/var/lib/jenkins 2025-12-04T09:06:16.2543572Z GITHUB_API_URL=https://api.github.com 2025-12-04T09:06:16.2543955Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T09:06:16.2544353Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152 2025-12-04T09:06:16.2544734Z USE_SYSTEM_NCCL=1 2025-12-04T09:06:16.2544998Z NUM_TEST_SHARDS=3 2025-12-04T09:06:16.2545261Z UCX_HOME=/usr 2025-12-04T09:06:16.2545925Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e 2025-12-04T09:06:16.2547060Z JOB_NAME=linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T09:06:16.2548446Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e 2025-12-04T09:06:16.2549369Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json 2025-12-04T09:06:16.2549948Z GITHUB_EVENT_NAME=schedule 2025-12-04T09:06:16.2550246Z DASHBOARD_TAG= 2025-12-04T09:06:16.2550506Z GITHUB_RUN_ID=19922768520 2025-12-04T09:06:16.2550791Z INSTALLED_OPENBLAS= 2025-12-04T09:06:16.2551505Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e 2025-12-04T09:06:16.2552292Z GITHUB_ACTOR=huydhn 2025-12-04T09:06:16.2552543Z PR_NUMBER= 2025-12-04T09:06:16.2552783Z DESIRED_CUDA=12.8.1 2025-12-04T09:06:16.2553049Z GITHUB_RUN_ATTEMPT=1 2025-12-04T09:06:16.2553304Z VALGRIND=ON 2025-12-04T09:06:16.2553562Z ANACONDA_PYTHON_VERSION=3.10 2025-12-04T09:06:16.2553944Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T09:06:16.2554331Z TERM=vt100 2025-12-04T09:06:16.2554574Z INSTALLED_VISION=yes 2025-12-04T09:06:16.2554845Z BRANCH=main 2025-12-04T09:06:16.2555081Z SCCACHE_REGION=us-east-1 2025-12-04T09:06:16.2555391Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T09:06:16.2555712Z BUILD_AOT_INDUCTOR_TEST= 2025-12-04T09:06:16.2556010Z CUDA_PATH=/usr/local/cuda 2025-12-04T09:06:16.2556586Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux 2025-12-04T09:06:16.2557250Z GITHUB_SERVER_URL=https://github.com 2025-12-04T09:06:16.2557652Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96 2025-12-04T09:06:16.2558032Z REENABLED_ISSUES= 2025-12-04T09:06:16.2558283Z DOCS= 2025-12-04T09:06:16.2558605Z SHLVL=1 2025-12-04T09:06:16.2558815Z MAX_JOBS=46 2025-12-04T09:06:16.2559057Z GITHUB_ACTOR_ID=475357 2025-12-04T09:06:16.2559440Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:06:16.2559863Z GITHUB_REF_NAME=main 2025-12-04T09:06:16.2560295Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla 2025-12-04T09:06:16.2560780Z GITHUB_JOB=test 2025-12-04T09:06:16.2561024Z NO_TEST_TIMEOUT=False 2025-12-04T09:06:16.2561297Z TD_DISTRIBUTED=False 2025-12-04T09:06:16.2561591Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T09:06:16.2561917Z GITHUB_RETENTION_DAYS=90 2025-12-04T09:06:16.2562216Z OPENSSL_DIR=/opt/openssl 2025-12-04T09:06:16.2562521Z GITHUB_ACTION_REPOSITORY= 2025-12-04T09:06:16.2563412Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:06:16.2564328Z GITHUB_BASE_REF= 2025-12-04T09:06:16.2564582Z INSTALLED_ACL= 2025-12-04T09:06:16.2565116Z ARTIFACTS_FILE_SUFFIX=test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904 2025-12-04T09:06:16.2565708Z CI=true 2025-12-04T09:06:16.2565942Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T09:06:16.2566296Z RUST_LOG=sccache::server=error 2025-12-04T09:06:16.2566597Z JOB_ID=57116084904 2025-12-04T09:06:16.2566843Z GITHUB_HEAD_REF= 2025-12-04T09:06:16.2567097Z GITHUB_ACTION_REF= 2025-12-04T09:06:16.2567421Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2 2025-12-04T09:06:16.2567806Z TEST_SHOWLOCALS=False 2025-12-04T09:06:16.2568086Z GITHUB_WORKFLOW=trunk 2025-12-04T09:06:16.2568377Z DEBIAN_FRONTEND=noninteractive 2025-12-04T09:06:16.2569085Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_2fffbb7e-70cb-4aa2-8ece-efa2b00d2d4e 2025-12-04T09:06:16.2569824Z NO_TD=False 2025-12-04T09:06:16.2570080Z SKIP_SCCACHE_INITIALIZATION=1 2025-12-04T09:06:16.2570410Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/ 2025-12-04T09:06:16.2570917Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda 2025-12-04T09:06:16.2571398Z _=/usr/bin/env 2025-12-04T09:06:16.2571652Z + echo 'Testing pytorch' 2025-12-04T09:06:16.2571926Z Testing pytorch 2025-12-04T09:06:16.2572192Z + export LANG=C.UTF-8 2025-12-04T09:06:16.2572458Z + LANG=C.UTF-8 2025-12-04T09:06:16.2572685Z + PR_NUMBER= 2025-12-04T09:06:16.2573022Z + [[ distributed == \d\e\f\a\u\l\t ]] 2025-12-04T09:06:16.2573392Z + [[ distributed == \d\i\s\t\r\i\b\u\t\e\d ]] 2025-12-04T09:06:16.2573789Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *rocm* ]] 2025-12-04T09:06:16.2574190Z + [[ distributed == \s\l\o\w ]] 2025-12-04T09:06:16.2574595Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *slow-gradcheck* ]] 2025-12-04T09:06:16.2575067Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *cuda* ]] 2025-12-04T09:06:16.2575494Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T09:06:16.2575888Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T09:06:16.2576362Z + [[ distributed == *crossref* ]] 2025-12-04T09:06:16.2576906Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *rocm* ]] 2025-12-04T09:06:16.2577364Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *xpu* ]] 2025-12-04T09:06:16.2577826Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *-bazel-* ]] 2025-12-04T09:06:16.2578228Z + pip_install ninja==1.10.2 2025-12-04T09:06:16.2578654Z + pip_install_pkg='python3 -m pip install --progress-bar off' 2025-12-04T09:06:16.2579192Z + python3 -m pip install --progress-bar off ninja==1.10.2 2025-12-04T09:06:16.6466237Z Collecting ninja==1.10.2 2025-12-04T09:06:16.6732712Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB) 2025-12-04T09:06:16.6843716Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2025-12-04T09:06:17.0797498Z Installing collected packages: ninja 2025-12-04T09:06:17.0797921Z Attempting uninstall: ninja 2025-12-04T09:06:17.0802096Z Found existing installation: ninja 1.11.1.4 2025-12-04T09:06:17.0825290Z Uninstalling ninja-1.11.1.4: 2025-12-04T09:06:17.0894313Z Successfully uninstalled ninja-1.11.1.4 2025-12-04T09:06:17.1220355Z Successfully installed ninja-1.10.2 2025-12-04T09:06:17.1775347Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:06:17.1777586Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:06:17.1778752Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *aarch64* ]] 2025-12-04T09:06:17.1779233Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *asan* ]] 2025-12-04T09:06:17.1779706Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *-debug* ]] 2025-12-04T09:06:17.1780190Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 != *-bazel-* ]] 2025-12-04T09:06:17.1780854Z + echo 'We are not in debug mode: linux-jammy-cuda12.8-py3.10-gcc11. Expect the assertion to pass' 2025-12-04T09:06:17.1781688Z We are not in debug mode: linux-jammy-cuda12.8-py3.10-gcc11. Expect the assertion to pass 2025-12-04T09:06:17.1782242Z + cd test 2025-12-04T09:06:17.1782642Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)' 2025-12-04T09:06:18.8854360Z + [[ distributed == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2025-12-04T09:06:18.8854869Z + [[ distributed == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2025-12-04T09:06:18.8855335Z + [[ distributed == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]] 2025-12-04T09:06:18.8855774Z + DYNAMO_BENCHMARK_FLAGS=() 2025-12-04T09:06:18.8856990Z + [[ distributed == *pr_time_benchmarks* ]] 2025-12-04T09:06:18.8857398Z + [[ distributed == *dynamo_eager* ]] 2025-12-04T09:06:18.8857754Z + [[ distributed == *aot_eager* ]] 2025-12-04T09:06:18.8858113Z + [[ distributed == *aot_inductor* ]] 2025-12-04T09:06:18.8858501Z + [[ distributed == *max_autotune_inductor* ]] 2025-12-04T09:06:18.8858883Z + [[ distributed == *inductor* ]] 2025-12-04T09:06:18.8859248Z + [[ distributed == *dynamic* ]] 2025-12-04T09:06:18.8859590Z + [[ distributed == *cpu* ]] 2025-12-04T09:06:18.8859913Z + [[ distributed == *xpu* ]] 2025-12-04T09:06:18.8860258Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda) 2025-12-04T09:06:18.8887856Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *libtorch* ]] 2025-12-04T09:06:18.8888690Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *-bazel-* ]] 2025-12-04T09:06:18.8889983Z + cd test 2025-12-04T09:06:18.8890934Z + python -c 'import torch; print(torch.__config__.show())' 2025-12-04T09:06:21.1148249Z PyTorch built with: 2025-12-04T09:06:21.1148644Z - GCC 11.4 2025-12-04T09:06:21.1149005Z - C++ Version: 201703 2025-12-04T09:06:21.1149653Z - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T09:06:21.1150484Z - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T09:06:21.1150997Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T09:06:21.1151373Z - LAPACK is enabled (usually provided by MKL) 2025-12-04T09:06:21.1151785Z - NNPACK is enabled 2025-12-04T09:06:21.1152089Z - CPU capability usage: AVX512 2025-12-04T09:06:21.1152406Z - CUDA Runtime 12.8 2025-12-04T09:06:21.1152967Z - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_89,code=sm_89 2025-12-04T09:06:21.1153611Z - CuDNN 91.0.2 (built against CUDA 12.9) 2025-12-04T09:06:21.1159207Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=35b7a9a26c5923d98aebaa41a031dae21788a9ee, CUDA_VERSION=12.8, CUDNN_VERSION=9.10.2, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Werror -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=ON, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 2025-12-04T09:06:21.1166026Z 2025-12-04T09:06:21.5847252Z + cd test 2025-12-04T09:06:21.5847689Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2025-12-04T09:06:23.0129805Z ATen/Parallel: 2025-12-04T09:06:23.0130195Z at::get_num_threads() : 24 2025-12-04T09:06:23.0130543Z at::get_num_interop_threads() : 24 2025-12-04T09:06:23.0130908Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T09:06:23.0131290Z omp_get_max_threads() : 24 2025-12-04T09:06:23.0131949Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T09:06:23.0132659Z mkl_get_max_threads() : 24 2025-12-04T09:06:23.0133123Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T09:06:23.0133754Z std::thread::hardware_concurrency() : 48 2025-12-04T09:06:23.0134103Z Environment variables: 2025-12-04T09:06:23.0134395Z OMP_NUM_THREADS : [not set] 2025-12-04T09:06:23.0134706Z MKL_NUM_THREADS : [not set] 2025-12-04T09:06:23.0135009Z ATen parallel backend: OpenMP 2025-12-04T09:06:23.0135230Z 2025-12-04T09:06:23.2876977Z + [[ distributed == *numpy_2* ]] 2025-12-04T09:06:23.2877495Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *aarch64* ]] 2025-12-04T09:06:23.2877930Z + [[ distributed == *backward* ]] 2025-12-04T09:06:23.2878325Z + [[ distributed == *libtorch_agnostic_targetting* ]] 2025-12-04T09:06:23.2878723Z + [[ distributed == *xla* ]] 2025-12-04T09:06:23.2879068Z + [[ distributed == *vllm* ]] 2025-12-04T09:06:23.2879392Z + [[ distributed == *executorch* ]] 2025-12-04T09:06:23.2879738Z + [[ distributed == \j\i\t\_\l\e\g\a\c\y ]] 2025-12-04T09:06:23.2880123Z + [[ distributed == \q\u\a\n\t\i\z\a\t\i\o\n ]] 2025-12-04T09:06:23.2880901Z + [[ linux-jammy-cuda12.8-py3.10-gcc11 == *libtorch* ]] 2025-12-04T09:06:23.2881395Z + [[ distributed == distributed ]] 2025-12-04T09:06:23.2881710Z + test_distributed 2025-12-04T09:06:23.2882007Z + echo 'Testing distributed python tests' 2025-12-04T09:06:23.2882387Z Testing distributed python tests 2025-12-04T09:06:23.2882841Z + python test/run_test.py --distributed-tests --shard 3 3 --verbose 2025-12-04T09:06:28.8456680Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json 2025-12-04T09:06:28.8978084Z Ignoring disabled issues: [''] 2025-12-04T09:06:28.9084342Z Found test times from artifacts 2025-12-04T09:06:28.9498487Z Found test times from artifacts 2025-12-04T09:06:28.9515237Z Running all tests 2025-12-04T09:06:28.9685408Z Running parallel tests on 1 processes 2025-12-04T09:06:28.9686667Z Name: tests to run (est. time: 140.02min) 2025-12-04T09:06:28.9687030Z Serial tests (83): 2025-12-04T09:06:28.9687362Z distributed/test_c10d_functional_native 1/1 2025-12-04T09:06:28.9687777Z distributed/fsdp/test_fsdp_overlap 1/1 2025-12-04T09:06:28.9688201Z distributed/fsdp/test_fsdp_pure_fp16 1/1 2025-12-04T09:06:28.9688606Z distributed/tensor/debug/test_debug_mode 1/1 2025-12-04T09:06:28.9689007Z distributed/fsdp/test_fsdp_exec_order 1/1 2025-12-04T09:06:28.9689437Z distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 2025-12-04T09:06:28.9689885Z distributed/fsdp/test_fsdp_clip_grad_norm 1/1 2025-12-04T09:06:28.9690289Z distributed/fsdp/test_fsdp_core 2/2 2025-12-04T09:06:28.9690647Z distributed/algorithms/test_join 1/1 2025-12-04T09:06:28.9691074Z distributed/pipelining/test_schedule_multiproc 1/1 2025-12-04T09:06:28.9691803Z distributed/test_compute_comm_reordering 1/1 2025-12-04T09:06:28.9692190Z distributed/test_cupy_as_tensor 1/1 2025-12-04T09:06:28.9692557Z distributed/fsdp/test_fsdp_fx 1/1 2025-12-04T09:06:28.9692914Z distributed/_tools/test_sac_ilp 1/1 2025-12-04T09:06:28.9693294Z distributed/checkpoint/test_hf_storage 1/1 2025-12-04T09:06:28.9693701Z distributed/pipelining/test_microbatch 1/1 2025-12-04T09:06:28.9694111Z distributed/tensor/test_placement_types 1/1 2025-12-04T09:06:28.9694573Z distributed/tensor/test_dtensor_dispatch_overhead 1/1 2025-12-04T09:06:28.9695120Z distributed/checkpoint/_experimental/test_checkpoint_reader 1/1 2025-12-04T09:06:28.9695643Z distributed/checkpoint/test_format_utils 1/1 2025-12-04T09:06:28.9696204Z distributed/test_aten_comm_compute_reordering 1/2 2025-12-04T09:06:28.9696805Z distributed/tensor/test_redistribute 2/2 2025-12-04T09:06:28.9697233Z distributed/tensor/parallel/test_tp_style 1/1 2025-12-04T09:06:28.9697660Z distributed/tensor/test_api 1/1 2025-12-04T09:06:28.9698020Z distributed/checkpoint/test_fsspec 1/1 2025-12-04T09:06:28.9698480Z distributed/tensor/experimental/test_tp_transform 1/1 2025-12-04T09:06:28.9698954Z distributed/checkpoint/test_traverse 1/1 2025-12-04T09:06:28.9699358Z distributed/tensor/test_random_ops 1/1 2025-12-04T09:06:28.9699812Z distributed/_composable/fsdp/test_fully_shard_logging 1/1 2025-12-04T09:06:28.9700283Z distributed/launcher/test_api 1/1 2025-12-04T09:06:28.9700705Z distributed/elastic/multiprocessing/test_api 1/1 2025-12-04T09:06:28.9701130Z distributed/fsdp/test_shard_utils 1/1 2025-12-04T09:06:28.9701560Z distributed/checkpoint/test_fsdp_optim_state 1/1 2025-12-04T09:06:28.9702054Z distributed/checkpoint/e2e/test_e2e_save_and_load 1/1 2025-12-04T09:06:28.9702543Z distributed/checkpoint/test_dtensor_resharding 1/1 2025-12-04T09:06:28.9702982Z distributed/fsdp/test_fsdp_memory 1/1 2025-12-04T09:06:28.9703397Z distributed/tensor/test_pointwise_ops 1/1 2025-12-04T09:06:28.9703831Z distributed/checkpoint/test_compatibility 1/1 2025-12-04T09:06:28.9704246Z distributed/_tools/test_mem_tracker 1/1 2025-12-04T09:06:28.9704649Z distributed/elastic/test_control_plane 1/1 2025-12-04T09:06:28.9705174Z distributed/test_fake_pg 1/1 2025-12-04T09:06:28.9705566Z distributed/checkpoint/test_fsdp_model_state 1/1 2025-12-04T09:06:28.9706000Z distributed/test_functional_api 1/1 2025-12-04T09:06:28.9706504Z distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_ 1/1 2025-12-04T09:06:28.9707023Z distributed/tensor/debug/test_comm_mode 1/1 2025-12-04T09:06:28.9707419Z distributed/test_dist2 1/1 2025-12-04T09:06:28.9707866Z distributed/_composable/fsdp/test_fully_shard_grad_scaler 1/1 2025-12-04T09:06:28.9708342Z distributed/launcher/test_run 1/1 2025-12-04T09:06:28.9708859Z distributed/fsdp/test_fsdp_backward_prefetch 1/1 2025-12-04T09:06:28.9709298Z distributed/checkpoint/test_checkpoint 1/1 2025-12-04T09:06:28.9709703Z distributed/_pycute/test_coalesce 1/1 2025-12-04T09:06:28.9710075Z distributed/_pycute/test_complement 1/1 2025-12-04T09:06:28.9710468Z distributed/_pycute/test_composition 1/1 2025-12-04T09:06:28.9710856Z distributed/_pycute/test_int_tuple 1/1 2025-12-04T09:06:28.9711234Z distributed/_pycute/test_left_inverse 1/1 2025-12-04T09:06:28.9711635Z distributed/_pycute/test_right_inverse 1/1 2025-12-04T09:06:28.9712040Z distributed/_composable/test_replicate 1/1 2025-12-04T09:06:28.9712453Z distributed/checkpoint/test_hsdp_checkpoint 1/1 2025-12-04T09:06:28.9712928Z distributed/tensor/parallel/test_parallelize_api 1/1 2025-12-04T09:06:28.9713372Z distributed/fsdp/test_fsdp_state_dict 1/2 2025-12-04T09:06:28.9713757Z distributed/_pycute/test_typing 1/1 2025-12-04T09:06:28.9714119Z distributed/test_distributed_spawn 1/9 2025-12-04T09:06:28.9714500Z distributed/test_distributed_spawn 4/9 2025-12-04T09:06:28.9714950Z distributed/test_distributed_spawn 7/9 2025-12-04T09:06:28.9715318Z distributed/test_serialization 1/1 2025-12-04T09:06:28.9715719Z distributed/fsdp/test_fsdp_ignored_modules 1/1 2025-12-04T09:06:28.9716192Z distributed/_composable/fsdp/test_fully_shard_comm 1/1 2025-12-04T09:06:28.9716670Z distributed/fsdp/test_fsdp_sharded_grad_scaler 1/1 2025-12-04T09:06:28.9717162Z distributed/_shard/sharding_plan/test_sharding_plan 1/1 2025-12-04T09:06:28.9717675Z distributed/_shard/sharded_optim/test_sharded_optim 1/1 2025-12-04T09:06:28.9718209Z distributed/_composable/fsdp/test_fully_shard_state_dict 1/1 2025-12-04T09:06:28.9718669Z distributed/tensor/test_utils 1/1 2025-12-04T09:06:28.9719106Z distributed/_composable/fsdp/test_fully_shard_memory 1/1 2025-12-04T09:06:28.9719577Z distributed/checkpoint/test_state_dict 1/1 2025-12-04T09:06:28.9719994Z distributed/checkpoint/test_state_dict_utils 1/1 2025-12-04T09:06:28.9720413Z distributed/rpc/test_faulty_agent 1/1 2025-12-04T09:06:28.9721079Z distributed/_shard/sharded_tensor/ops/test_embedding 1/1 2025-12-04T09:06:28.9721813Z distributed/_shard/sharded_tensor/test_sharded_tensor_reshard 1/1 2025-12-04T09:06:28.9722324Z distributed/test_c10d_spawn_nccl 1/1 2025-12-04T09:06:28.9722703Z distributed/test_c10d_spawn_ucc 1/1 2025-12-04T09:06:28.9723080Z distributed/test_c10d_gloo 1/2 2025-12-04T09:06:28.9723505Z distributed/_shard/sharded_tensor/test_sharded_tensor 1/1 2025-12-04T09:06:28.9723968Z distributed/test_c10d_nccl 3/3 2025-12-04T09:06:28.9724309Z Parallel tests (0): 2025-12-04T09:06:28.9724598Z Name: excluded (est. time: 0.0min) 2025-12-04T09:06:28.9724930Z Serial tests (0): 2025-12-04T09:06:28.9725205Z Parallel tests (0): 2025-12-04T09:06:28.9725750Z Running distributed/test_c10d_functional_native 1/1 ... [2025-12-04 09:06:28.969400][820.577319465] 2025-12-04T09:06:28.9726388Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:06:28.9727720Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_functional_native.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:06:28.969799] 2025-12-04T09:10:58.2980715Z 2025-12-04T09:10:58.2982314Z distributed/test_c10d_functional_native 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_functional_native_1.1_5ceb4f282067967e_.log 2025-12-04T09:10:58.2999052Z Running 33 items in this shard: test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_gather_into_tensor_coalesced, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_gather_into_tensor_single, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_reduce_coalesced, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_reduce_coalesced_, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_reduce_single, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_reduce_single_, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_all_to_all_single, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_broadcast, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_fixed_striding, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_functional_collectives_inference_mode, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_inductor_dtypeview_memory_leak, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_reduce_scatter_tensor_coalesced, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_reduce_scatter_tensor_out, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_reduce_scatter_tensor_single, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_threading, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_unwaited, test/distributed/test_c10d_functional_native.py::TestWithNCCL::test_wait_tensor, test/distributed/test_c10d_functional_native.py::PyWorkTest::test_collectives, test/distributed/test_c10d_functional_native.py::PyWorkTest::test_wait_tensor, test/distributed/test_c10d_functional_native.py::CompileTestCPU::test_inductor_all_reduce_cpu, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_gather_into_tensor_coalesced, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_gather_into_tensor_single, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_coalesced, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_non_contig_input, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_reduce_single, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_to_all_single, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_broadcast, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_inplace_op_on_view, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_reduce_scatter_tensor_coalesced, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_reduce_scatter_tensor_single, test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_reuse_buffer_after_inplace_collective, test/distributed/test_c10d_functional_native.py::CompileTest::test_ranks_and_tag, test/distributed/test_c10d_functional_native.py::CompileTest::test_wait_tensor 2025-12-04T09:10:58.3015624Z 2025-12-04T09:10:58.3016059Z Finished distributed/test_c10d_functional_native 1/1 ... [2025-12-04 09:10:58.297414][1089.905329516], took 4.49min 2025-12-04T09:10:58.3017880Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_functional_native/distributed.test_c10d_functional_native-369cc3de9e188dd1.xml 2025-12-04T09:10:58.7216419Z Uploading artifacts took 0.13 seconds 2025-12-04T09:10:58.7217812Z Running distributed/fsdp/test_fsdp_overlap 1/1 ... [2025-12-04 09:10:58.721573][1090.329489371] 2025-12-04T09:10:58.7218480Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:10:58.7222844Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_overlap.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:10:58.721945] 2025-12-04T09:11:56.7673575Z 2025-12-04T09:11:56.7674530Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_overlap 1/1 (test/test-reports/distributed.fsdp.test_fsdp_overlap_1.1_6a5a97322901a03e_.log) 2025-12-04T09:11:56.7675948Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-39c8c10a0ef1a34e.xml 2025-12-04T09:11:56.7676907Z ============================= test session starts ============================== 2025-12-04T09:11:56.7677596Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:11:56.7678176Z cachedir: .pytest_cache 2025-12-04T09:11:56.7678873Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:11:56.7679654Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:11:56.7692068Z configfile: pytest.ini 2025-12-04T09:11:56.7692992Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:11:56.7693784Z collecting ... collected 1 item 2025-12-04T09:11:56.7694188Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T09:11:56.7695115Z Running 1 items in this shard: test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda 2025-12-04T09:11:56.7695823Z 2025-12-04T09:11:56.7697126Z distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda I1204 09:11:02.144000 14371 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 14423 2025-12-04T09:11:56.7699658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:11:56.7700938Z _init_core_state( 2025-12-04T09:11:56.7703233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:11:56.7705446Z _warn_cpu_init() 2025-12-04T09:11:56.7706052Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:11:56.7707165Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:11:56.7708812Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:11:56.7710516Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:11:56.7712075Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:11:56.7713519Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:11:56.7715254Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7716821Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:11:56.7718384Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7719945Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:11:56.7722076Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:11:56.7723668Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:11:56.7725335Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:11:56.7726944Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:11:56.7729280Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336. 2025-12-04T09:11:56.7731489Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:11:56.7732638Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:11:56.7734508Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T09:11:56.7736084Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:11:56.7737541Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:11:56.7738948Z [rank0]:E1204 09:11:13.146000 14423 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:11:56.7739747Z dist init r=0, world=1 2025-12-04T09:11:56.7739926Z 2025-12-04T09:11:56.7740033Z rank0: 2025-12-04T09:11:56.7740668Z e1: {'cpu_iter': 0.0018478633000004407, 'cpu_wait': 2.8651499999376995e-05, 'gpu_compute': 0.010128000122494995, 'gpu_total': 0.8250816106796265} 2025-12-04T09:11:56.7741787Z e2: {'cpu_iter': 0.004702577100000127, 'cpu_wait': 3.239280000002509e-05, 'gpu_compute': 0.13931520022451876, 'gpu_total': 2.1374751806259153} 2025-12-04T09:11:56.7742861Z e3: {'cpu_iter': 0.0019113367999999298, 'cpu_wait': 0.15016560569999998, 'gpu_compute': 152.52830657958984, 'gpu_total': 152.9109375} 2025-12-04T09:11:56.7743908Z e4: {'cpu_iter': 0.004734771799999926, 'cpu_wait': 0.1485989104000005, 'gpu_compute': 152.59314613342286, 'gpu_total': 153.189501953125} 2025-12-04T09:11:56.7745693Z [rank0]:[W1204 09:11:13.102715704 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:11:56.7747189Z FAILED [13.2049s] [100%] 2025-12-04T09:11:56.7747393Z 2025-12-04T09:11:56.7747547Z =================================== FAILURES =================================== 2025-12-04T09:11:56.7748172Z _________ TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda _________ 2025-12-04T09:11:56.7748859Z Traceback (most recent call last): 2025-12-04T09:11:56.7749684Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:11:56.7750401Z self._join_processes(fn) 2025-12-04T09:11:56.7751124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:11:56.7751899Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:11:56.7752693Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:11:56.7753472Z raise RuntimeError(error) 2025-12-04T09:11:56.7753886Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:11:56.7754320Z Traceback (most recent call last): 2025-12-04T09:11:56.7755020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:11:56.7755737Z getattr(self, test_name)() 2025-12-04T09:11:56.7756397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:11:56.7757087Z fn() 2025-12-04T09:11:56.7757668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7758433Z method(*args, **kwargs) 2025-12-04T09:11:56.7759065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7759752Z method(*args, **kwargs) 2025-12-04T09:11:56.7760397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:11:56.7761057Z with policy(): 2025-12-04T09:11:56.7761867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:11:56.7762596Z raise RuntimeError(msg) 2025-12-04T09:11:56.7763918Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336. 2025-12-04T09:11:56.7765160Z 2025-12-04T09:11:56.7765366Z To execute this test, run the following from the base repo dir: 2025-12-04T09:11:56.7766321Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T09:11:56.7767080Z 2025-12-04T09:11:56.7767339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:11:56.7767720Z 2025-12-04T09:11:56.7767725Z 2025-12-04T09:11:56.7767950Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:11:56.7768551Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:11:56.7769712Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-39c8c10a0ef1a34e.xml - 2025-12-04T09:11:56.7770794Z =========================== short test summary info ============================ 2025-12-04T09:11:56.7771902Z FAILED [13.2049s] distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:11:56.7772942Z Traceback (most recent call last): 2025-12-04T09:11:56.7773845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:11:56.7774573Z getattr(self, test_name)() 2025-12-04T09:11:56.7775255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:11:56.7775934Z fn() 2025-12-04T09:11:56.7776584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7777495Z method(*args, **kwargs) 2025-12-04T09:11:56.7778201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7778968Z method(*args, **kwargs) 2025-12-04T09:11:56.7779671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:11:56.7780412Z with policy(): 2025-12-04T09:11:56.7781100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:11:56.7781875Z raise RuntimeError(msg) 2025-12-04T09:11:56.7783262Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336. 2025-12-04T09:11:56.7784592Z 2025-12-04T09:11:56.7784809Z To execute this test, run the following from the base repo dir: 2025-12-04T09:11:56.7785822Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T09:11:56.7786697Z 2025-12-04T09:11:56.7786964Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:11:56.7787556Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:11:56.7788029Z ============================== 1 failed in 13.42s ============================== 2025-12-04T09:11:56.7788430Z Got exit code 1 2025-12-04T09:11:56.7788808Z Retrying single test... 2025-12-04T09:11:56.7789725Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-bb36a88bac557029.xml 2025-12-04T09:11:56.7790639Z ============================= test session starts ============================== 2025-12-04T09:11:56.7791266Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:11:56.7791829Z cachedir: .pytest_cache 2025-12-04T09:11:56.7792489Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:11:56.7793225Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:11:56.7793561Z configfile: pytest.ini 2025-12-04T09:11:56.7794237Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:11:56.7795171Z collecting ... collected 1 item 2025-12-04T09:11:56.7796107Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda 2025-12-04T09:11:56.7797061Z Running 1 items in this shard 2025-12-04T09:11:56.7797263Z 2025-12-04T09:11:56.7798281Z distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda I1204 09:11:19.414000 14494 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 14546 2025-12-04T09:11:56.7800427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:11:56.7801655Z _init_core_state( 2025-12-04T09:11:56.7803847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:11:56.7806129Z _warn_cpu_init() 2025-12-04T09:11:56.7806794Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:11:56.7807816Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:11:56.7809319Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:11:56.7810795Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:11:56.7812259Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:11:56.7813783Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:11:56.7815203Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7817017Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:11:56.7818622Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7820219Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:11:56.7822012Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:11:56.7823574Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:11:56.7825141Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:11:56.7826754Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:11:56.7829024Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336. 2025-12-04T09:11:56.7831181Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:11:56.7832363Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:11:56.7834456Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T09:11:56.7836022Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:11:56.7837203Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:11:56.7838574Z [rank0]:E1204 09:11:30.170000 14546 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:11:56.7839348Z dist init r=0, world=1 2025-12-04T09:11:56.7839520Z 2025-12-04T09:11:56.7839621Z rank0: 2025-12-04T09:11:56.7840220Z e1: {'cpu_iter': 0.0021133577999997042, 'cpu_wait': 3.100590000038039e-05, 'gpu_compute': 0.009472000156529247, 'gpu_total': 0.8031007945537567} 2025-12-04T09:11:56.7841305Z e2: {'cpu_iter': 0.005042445299999976, 'cpu_wait': 3.310709999997386e-05, 'gpu_compute': 0.13894400056451559, 'gpu_total': 2.1602207899093626} 2025-12-04T09:11:56.7842367Z e3: {'cpu_iter': 0.0021608666999997084, 'cpu_wait': 0.14332368769999987, 'gpu_compute': 146.0892475128174, 'gpu_total': 146.47476654052736} 2025-12-04T09:11:56.7843409Z e4: {'cpu_iter': 0.005032401400000275, 'cpu_wait': 0.14177950029999967, 'gpu_compute': 146.0942470550537, 'gpu_total': 146.6829818725586} 2025-12-04T09:11:56.7845126Z [rank0]:[W1204 09:11:30.083323253 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:11:56.7846548Z FAILED [12.7267s] [100%] 2025-12-04T09:11:56.7846739Z 2025-12-04T09:11:56.7846886Z =================================== FAILURES =================================== 2025-12-04T09:11:56.7847489Z _________ TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda _________ 2025-12-04T09:11:56.7848044Z Traceback (most recent call last): 2025-12-04T09:11:56.7848815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:11:56.7849592Z self._join_processes(fn) 2025-12-04T09:11:56.7850368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:11:56.7851205Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:11:56.7852062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:11:56.7852907Z raise RuntimeError(error) 2025-12-04T09:11:56.7853345Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:11:56.7853815Z Traceback (most recent call last): 2025-12-04T09:11:56.7854583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:11:56.7855357Z getattr(self, test_name)() 2025-12-04T09:11:56.7856078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:11:56.7857079Z fn() 2025-12-04T09:11:56.7857739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7858505Z method(*args, **kwargs) 2025-12-04T09:11:56.7859210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7859985Z method(*args, **kwargs) 2025-12-04T09:11:56.7860696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:11:56.7861434Z with policy(): 2025-12-04T09:11:56.7862185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:11:56.7862964Z raise RuntimeError(msg) 2025-12-04T09:11:56.7864357Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336. 2025-12-04T09:11:56.7865675Z 2025-12-04T09:11:56.7865893Z To execute this test, run the following from the base repo dir: 2025-12-04T09:11:56.7866912Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T09:11:56.7867720Z 2025-12-04T09:11:56.7868106Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:11:56.7868497Z 2025-12-04T09:11:56.7868501Z 2025-12-04T09:11:56.7868726Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:11:56.7869420Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:11:56.7870534Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-bb36a88bac557029.xml - 2025-12-04T09:11:56.7871562Z =========================== short test summary info ============================ 2025-12-04T09:11:56.7872594Z FAILED [12.7267s] distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:11:56.7873654Z Traceback (most recent call last): 2025-12-04T09:11:56.7874367Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:11:56.7875088Z getattr(self, test_name)() 2025-12-04T09:11:56.7875764Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:11:56.7876461Z fn() 2025-12-04T09:11:56.7877049Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7877733Z method(*args, **kwargs) 2025-12-04T09:11:56.7878363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7879049Z method(*args, **kwargs) 2025-12-04T09:11:56.7879692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:11:56.7880360Z with policy(): 2025-12-04T09:11:56.7880973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:11:56.7881661Z raise RuntimeError(msg) 2025-12-04T09:11:56.7882909Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336. 2025-12-04T09:11:56.7884072Z 2025-12-04T09:11:56.7884268Z To execute this test, run the following from the base repo dir: 2025-12-04T09:11:56.7885175Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T09:11:56.7885887Z 2025-12-04T09:11:56.7886122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:11:56.7886653Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:11:56.7887067Z ============================== 1 failed in 12.94s ============================== 2025-12-04T09:11:56.7887421Z Got exit code 1 2025-12-04T09:11:56.7887715Z Retrying single test... 2025-12-04T09:11:56.7888464Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-9b6f6e417d9b4600.xml 2025-12-04T09:11:56.7889315Z ============================= test session starts ============================== 2025-12-04T09:11:56.7889903Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:11:56.7890438Z cachedir: .pytest_cache 2025-12-04T09:11:56.7891057Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:11:56.7891751Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:11:56.7892073Z configfile: pytest.ini 2025-12-04T09:11:56.7892708Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:11:56.7893420Z collecting ... collected 1 item 2025-12-04T09:11:56.7894284Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda 2025-12-04T09:11:56.7895346Z Running 1 items in this shard 2025-12-04T09:11:56.7895543Z 2025-12-04T09:11:56.7896605Z distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda I1204 09:11:36.214000 14617 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 14669 2025-12-04T09:11:56.7898996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:11:56.7900335Z _init_core_state( 2025-12-04T09:11:56.7902536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:11:56.7904789Z _warn_cpu_init() 2025-12-04T09:11:56.7905412Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:11:56.7906554Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:11:56.7908249Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:11:56.7909966Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:11:56.7911598Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:11:56.7912948Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:11:56.7914296Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7915719Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:11:56.7917196Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7918623Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:11:56.7920040Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:11:56.7921778Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:11:56.7923346Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:11:56.7924968Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:11:56.7927253Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336. 2025-12-04T09:11:56.7929384Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:11:56.7930565Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:11:56.7932609Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T09:11:56.7934362Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:11:56.7935462Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:11:56.7936941Z [rank0]:E1204 09:11:50.437000 14669 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:11:56.7937741Z dist init r=0, world=1 2025-12-04T09:11:56.7937921Z 2025-12-04T09:11:56.7938021Z rank0: 2025-12-04T09:11:56.7938649Z e1: {'cpu_iter': 0.0018660827999998019, 'cpu_wait': 2.8354200000002548e-05, 'gpu_compute': 0.009180800034664571, 'gpu_total': 0.8075520098209381} 2025-12-04T09:11:56.7939771Z e2: {'cpu_iter': 0.004684944300000282, 'cpu_wait': 3.31210999995335e-05, 'gpu_compute': 0.14296319913119077, 'gpu_total': 2.1671871900558473} 2025-12-04T09:11:56.7940860Z e3: {'cpu_iter': 0.0018834125999999784, 'cpu_wait': 0.22940703660000014, 'gpu_compute': 231.5696128845215, 'gpu_total': 231.934912109375} 2025-12-04T09:11:56.7941930Z e4: {'cpu_iter': 0.004714327500000693, 'cpu_wait': 0.2277740248999999, 'gpu_compute': 231.66344566345214, 'gpu_total': 232.25556335449218} 2025-12-04T09:11:56.7943710Z [rank0]:[W1204 09:11:50.346849138 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:11:56.7945092Z FAILED [16.2794s] [100%] 2025-12-04T09:11:56.7945292Z 2025-12-04T09:11:56.7945442Z =================================== FAILURES =================================== 2025-12-04T09:11:56.7946056Z _________ TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda _________ 2025-12-04T09:11:56.7946635Z Traceback (most recent call last): 2025-12-04T09:11:56.7947500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:11:56.7948296Z self._join_processes(fn) 2025-12-04T09:11:56.7949196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:11:56.7950072Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:11:56.7950859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:11:56.7951628Z raise RuntimeError(error) 2025-12-04T09:11:56.7952031Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:11:56.7952466Z Traceback (most recent call last): 2025-12-04T09:11:56.7953167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:11:56.7953878Z getattr(self, test_name)() 2025-12-04T09:11:56.7954544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:11:56.7955233Z fn() 2025-12-04T09:11:56.7955816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7956491Z method(*args, **kwargs) 2025-12-04T09:11:56.7957117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7957792Z method(*args, **kwargs) 2025-12-04T09:11:56.7958427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:11:56.7959150Z with policy(): 2025-12-04T09:11:56.7959759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:11:56.7960445Z raise RuntimeError(msg) 2025-12-04T09:11:56.7961671Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336. 2025-12-04T09:11:56.7962838Z 2025-12-04T09:11:56.7963034Z To execute this test, run the following from the base repo dir: 2025-12-04T09:11:56.7963913Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T09:11:56.7964612Z 2025-12-04T09:11:56.7964845Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:11:56.7965199Z 2025-12-04T09:11:56.7965203Z 2025-12-04T09:11:56.7965402Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:11:56.7965939Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:11:56.7967026Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-9b6f6e417d9b4600.xml - 2025-12-04T09:11:56.7968036Z =========================== short test summary info ============================ 2025-12-04T09:11:56.7969055Z FAILED [16.2794s] distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:11:56.7970018Z Traceback (most recent call last): 2025-12-04T09:11:56.7970706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:11:56.7971411Z getattr(self, test_name)() 2025-12-04T09:11:56.7972080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:11:56.7972763Z fn() 2025-12-04T09:11:56.7973414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7974096Z method(*args, **kwargs) 2025-12-04T09:11:56.7974733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:11:56.7975393Z method(*args, **kwargs) 2025-12-04T09:11:56.7976030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:11:56.7976947Z with policy(): 2025-12-04T09:11:56.7977668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:11:56.7978452Z raise RuntimeError(msg) 2025-12-04T09:11:56.7979857Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 713949184 and is now 716046336. 2025-12-04T09:11:56.7981179Z 2025-12-04T09:11:56.7981408Z To execute this test, run the following from the base repo dir: 2025-12-04T09:11:56.7982416Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_overlap.py TestForwardOverlapWorldSizeOneCUDA.test_forward_overlap_cuda 2025-12-04T09:11:56.7983210Z 2025-12-04T09:11:56.7983475Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:11:56.7984063Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:11:56.7984545Z ============================== 1 failed in 16.49s ============================== 2025-12-04T09:11:56.7984997Z Got exit code 1 2025-12-04T09:11:56.7985738Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda 2025-12-04T09:11:56.7986864Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:11:56.7988068Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-83c25fe932c36613.xml 2025-12-04T09:11:56.7989128Z ============================= test session starts ============================== 2025-12-04T09:11:56.7989719Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:11:56.7990254Z cachedir: .pytest_cache 2025-12-04T09:11:56.7990887Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:11:56.7991578Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:11:56.7991894Z configfile: pytest.ini 2025-12-04T09:11:56.7992548Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:11:56.7993323Z collecting ... collected 1 item / 1 deselected / 0 selected 2025-12-04T09:11:56.7993753Z stepcurrent: skipping 1 already run items. 2025-12-04T09:11:56.7994097Z Running 0 items in this shard 2025-12-04T09:11:56.7994283Z 2025-12-04T09:11:56.7995042Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-83c25fe932c36613.xml - 2025-12-04T09:11:56.7996040Z ============================ 1 deselected in 0.01s ============================= 2025-12-04T09:11:56.7996929Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_overlap.py::TestForwardOverlapWorldSizeOneCUDA::test_forward_overlap_cuda'] 2025-12-04T09:11:56.7997667Z 2025-12-04T09:11:56.7998234Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_overlap 1/1 (test/test-reports/distributed.fsdp.test_fsdp_overlap_1.1_6a5a97322901a03e_.log) 2025-12-04T09:11:56.7998922Z 2025-12-04T09:11:56.7999342Z Finished distributed/fsdp/test_fsdp_overlap 1/1 ... [2025-12-04 09:11:56.767099][1148.375015758], took 0.97min 2025-12-04T09:11:56.8000605Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-39c8c10a0ef1a34e.xml 2025-12-04T09:11:56.8634503Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-bb36a88bac557029.xml 2025-12-04T09:11:56.8953703Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-9b6f6e417d9b4600.xml 2025-12-04T09:11:56.9373730Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-83c25fe932c36613.xml 2025-12-04T09:11:57.1134303Z Uploading logs for 57116084904 to S3 2025-12-04T09:11:57.1510844Z Uploading artifacts took 0.18 seconds 2025-12-04T09:11:57.1511334Z distributed/fsdp/test_fsdp_overlap 1/1 failed! 2025-12-04T09:11:57.1515777Z Running distributed/fsdp/test_fsdp_pure_fp16 1/1 ... [2025-12-04 09:11:57.151133][1148.759050119] 2025-12-04T09:11:57.1516371Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:11:57.1517642Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_pure_fp16.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:11:57.151443] 2025-12-04T09:13:23.6624039Z 2025-12-04T09:13:23.6626569Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_pure_fp16 1/1 (test/test-reports/distributed.fsdp.test_fsdp_pure_fp16_1.1_2de43ef0fea2c555_.log) 2025-12-04T09:13:23.6630252Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-e1278d34de852f2a.xml 2025-12-04T09:13:23.6631266Z ============================= test session starts ============================== 2025-12-04T09:13:23.6631949Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:13:23.6632547Z cachedir: .pytest_cache 2025-12-04T09:13:23.6633274Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:13:23.6634068Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:13:23.6634411Z configfile: pytest.ini 2025-12-04T09:13:23.6635133Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:13:23.6635943Z collecting ... collected 2 items 2025-12-04T09:13:23.6636366Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T09:13:23.6637632Z Running 2 items in this shard: test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda, test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda 2025-12-04T09:13:23.6638709Z 2025-12-04T09:13:23.6639607Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda I1204 09:12:00.594000 14796 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 14848 2025-12-04T09:13:23.6641136Z I1204 09:12:00.595000 14796 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 14849 2025-12-04T09:13:23.6642278Z I1204 09:12:00.596000 14796 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 14850 2025-12-04T09:13:23.6643417Z I1204 09:12:00.596000 14796 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 14851 2025-12-04T09:13:23.6645955Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:13:23.6648005Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:13:23.6650036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:13:23.6652057Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:13:23.6654082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:13:23.6656092Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:13:23.6658237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:13:23.6660336Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:13:23.6661656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:13:23.6662917Z return func(*args, **kwargs) 2025-12-04T09:13:23.6663611Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.6664745Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.6666439Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.6668091Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.6669832Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.6671324Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.6672780Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6674327Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6675879Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6677428Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6679582Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.6681103Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.6682628Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.6684186Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.6686264Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 720306176 and is now 749666304. 2025-12-04T09:13:23.6688205Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6689332Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.6691055Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.6692545Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6693739Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.6695093Z [rank0]:E1204 09:12:07.415000 14848 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:13:23.6696208Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.6697598Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.6699289Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.6700959Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.6702607Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.6704143Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.6706178Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6707795Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6709510Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6711148Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6712705Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.6714211Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.6715729Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.6717281Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.6719363Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 609157120 and is now 640614400. 2025-12-04T09:13:23.6721696Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6722874Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.6724656Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.6726264Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6727497Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.6728913Z [rank2]:E1204 09:12:07.415000 14850 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:13:23.6730065Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.6731199Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.6733008Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.6734709Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.6736263Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.6738008Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.6739518Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6741134Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6742852Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6744460Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6746072Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.6747619Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.6749246Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.6750690Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.6752594Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 604962816 and is now 640614400. 2025-12-04T09:13:23.6754370Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6755405Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.6757054Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.6758368Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6759467Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.6760718Z [rank1]:E1204 09:12:07.416000 14849 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:13:23.6761729Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.6762749Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.6764256Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.6765961Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.6767494Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.6768946Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.6770368Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6771930Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6773434Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6774924Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6776521Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.6778285Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.6779857Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.6781463Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.6783592Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 607059968 and is now 640614400. 2025-12-04T09:13:23.6785590Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6786839Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.6788632Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.6790069Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6791156Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.6792407Z [rank3]:E1204 09:12:07.417000 14851 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:13:23.6793124Z dist init r=0, world=4 2025-12-04T09:13:23.6793386Z dist init r=3, world=4 2025-12-04T09:13:23.6793629Z dist init r=2, world=4 2025-12-04T09:13:23.6793880Z dist init r=1, world=4 2025-12-04T09:13:23.6795088Z [rank0]:[W1204 09:12:07.428003541 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:13:23.6796310Z FAILED [8.6219s] [ 50%] 2025-12-04T09:13:23.6796643Z 2025-12-04T09:13:23.6796788Z =================================== FAILURES =================================== 2025-12-04T09:13:23.6797308Z ____________________ TestPureFP16CUDA.test_fp16_dtypes_cuda ____________________ 2025-12-04T09:13:23.6797794Z Traceback (most recent call last): 2025-12-04T09:13:23.6798528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:13:23.6799289Z self._join_processes(fn) 2025-12-04T09:13:23.6800043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:13:23.6800863Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:13:23.6801742Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:13:23.6802560Z raise RuntimeError(error) 2025-12-04T09:13:23.6802986Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:13:23.6803444Z Traceback (most recent call last): 2025-12-04T09:13:23.6804183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.6804938Z getattr(self, test_name)() 2025-12-04T09:13:23.6805652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.6806379Z fn() 2025-12-04T09:13:23.6806990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6807707Z method(*args, **kwargs) 2025-12-04T09:13:23.6808452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6809134Z method(*args, **kwargs) 2025-12-04T09:13:23.6810002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.6810776Z with policy(): 2025-12-04T09:13:23.6811376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.6812232Z raise RuntimeError(msg) 2025-12-04T09:13:23.6813855Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 720306176 and is now 749666304. 2025-12-04T09:13:23.6815049Z 2025-12-04T09:13:23.6815269Z To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.6816095Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.6816799Z 2025-12-04T09:13:23.6817232Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.6817653Z 2025-12-04T09:13:23.6817658Z 2025-12-04T09:13:23.6817884Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:13:23.6818517Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:13:23.6819764Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-e1278d34de852f2a.xml - 2025-12-04T09:13:23.6821127Z =========================== short test summary info ============================ 2025-12-04T09:13:23.6822169Z FAILED [8.6219s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:13:23.6823121Z Traceback (most recent call last): 2025-12-04T09:13:23.6823904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.6824717Z getattr(self, test_name)() 2025-12-04T09:13:23.6825477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.6826256Z fn() 2025-12-04T09:13:23.6826896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6827663Z method(*args, **kwargs) 2025-12-04T09:13:23.6828384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6829137Z method(*args, **kwargs) 2025-12-04T09:13:23.6829973Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.6830736Z with policy(): 2025-12-04T09:13:23.6831717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.6832635Z raise RuntimeError(msg) 2025-12-04T09:13:23.6833873Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 720306176 and is now 749666304. 2025-12-04T09:13:23.6835015Z 2025-12-04T09:13:23.6835241Z To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.6836104Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.6836743Z 2025-12-04T09:13:23.6837005Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.6837591Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:13:23.6838063Z ============================== 1 failed in 8.83s =============================== 2025-12-04T09:13:23.6838442Z Got exit code 1 2025-12-04T09:13:23.6838712Z Retrying single test... 2025-12-04T09:13:23.6839558Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-efcb608498b7750d.xml 2025-12-04T09:13:23.6840515Z ============================= test session starts ============================== 2025-12-04T09:13:23.6841150Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:13:23.6841864Z cachedir: .pytest_cache 2025-12-04T09:13:23.6842557Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:13:23.6843321Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:13:23.6843662Z configfile: pytest.ini 2025-12-04T09:13:23.6844374Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:13:23.6845239Z collecting ... collected 2 items / 1 deselected / 1 selected 2025-12-04T09:13:23.6846276Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda 2025-12-04T09:13:23.6847113Z Running 1 items in this shard 2025-12-04T09:13:23.6847333Z 2025-12-04T09:13:23.6848210Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda I1204 09:12:14.094000 15133 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 15185 2025-12-04T09:13:23.6849884Z I1204 09:12:14.095000 15133 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 15186 2025-12-04T09:13:23.6851083Z I1204 09:12:14.095000 15133 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 15187 2025-12-04T09:13:23.6852091Z I1204 09:12:14.096000 15133 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 15188 2025-12-04T09:13:23.6854194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:13:23.6855993Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:13:23.6858280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:13:23.6860314Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:13:23.6862327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:13:23.6864341Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:13:23.6866365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:13:23.6868372Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:13:23.6869801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:13:23.6870906Z return func(*args, **kwargs) 2025-12-04T09:13:23.6871522Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.6872595Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.6874107Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.6875576Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.6877041Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.6878406Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.6879746Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6881173Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6882582Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6883997Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6885422Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.6886805Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.6888247Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.6889665Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.6891554Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 718209024 and is now 749666304. 2025-12-04T09:13:23.6893324Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6894374Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.6895960Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.6897566Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6898809Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.6900224Z [rank0]:E1204 09:12:20.972000 15185 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:13:23.6901445Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.6902569Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.6904256Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.6905920Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.6907572Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.6909328Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.6910662Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6912277Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6913707Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6915137Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6916558Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.6917924Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.6919367Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.6920940Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.6923227Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 604962816 and is now 640614400. 2025-12-04T09:13:23.6925223Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6926390Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.6928172Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.6929655Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6930887Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.6932409Z [rank3]:E1204 09:12:20.972000 15188 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:13:23.6933731Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.6934745Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.6936237Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.6938058Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.6939694Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.6941236Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.6942754Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6944355Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6945953Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6947533Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6949316Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.6950766Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.6952163Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.6953593Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.6955474Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 611254272 and is now 640614400. 2025-12-04T09:13:23.6957261Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6958309Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.6959901Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.6961220Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6962299Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.6963620Z [rank1]:E1204 09:12:20.972000 15186 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:13:23.6964644Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.6965657Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.6967140Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.6968613Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.6970084Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.6971454Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.6972806Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6974216Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6975646Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.6977351Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.6979022Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.6980590Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.6982144Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.6983746Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.6985882Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 607059968 and is now 640614400. 2025-12-04T09:13:23.6987872Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6989048Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.6990730Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.6992040Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.6993193Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.6994445Z [rank2]:E1204 09:12:20.973000 15187 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:13:23.6995141Z dist init r=2, world=4 2025-12-04T09:13:23.6995400Z dist init r=0, world=4 2025-12-04T09:13:23.6995652Z dist init r=3, world=4 2025-12-04T09:13:23.6995888Z dist init r=1, world=4 2025-12-04T09:13:23.6997083Z [rank0]:[W1204 09:12:21.986184200 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:13:23.6998329Z FAILED [9.2338s] [100%] 2025-12-04T09:13:23.6998489Z 2025-12-04T09:13:23.6998635Z =================================== FAILURES =================================== 2025-12-04T09:13:23.6999113Z ____________________ TestPureFP16CUDA.test_fp16_dtypes_cuda ____________________ 2025-12-04T09:13:23.6999576Z Traceback (most recent call last): 2025-12-04T09:13:23.7000296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:13:23.7001014Z self._join_processes(fn) 2025-12-04T09:13:23.7001720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:13:23.7002497Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:13:23.7003289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:13:23.7004065Z raise RuntimeError(error) 2025-12-04T09:13:23.7004462Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:13:23.7004903Z Traceback (most recent call last): 2025-12-04T09:13:23.7005602Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7006364Z getattr(self, test_name)() 2025-12-04T09:13:23.7007041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7007731Z fn() 2025-12-04T09:13:23.7008316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7008983Z method(*args, **kwargs) 2025-12-04T09:13:23.7009626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7010299Z method(*args, **kwargs) 2025-12-04T09:13:23.7010921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7011597Z with policy(): 2025-12-04T09:13:23.7012206Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7012897Z raise RuntimeError(msg) 2025-12-04T09:13:23.7013991Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 604962816 and is now 640614400. 2025-12-04T09:13:23.7015040Z 2025-12-04T09:13:23.7015232Z To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7016014Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.7016682Z 2025-12-04T09:13:23.7017119Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7017600Z 2025-12-04T09:13:23.7017604Z 2025-12-04T09:13:23.7017833Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:13:23.7018462Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:13:23.7019724Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-efcb608498b7750d.xml - 2025-12-04T09:13:23.7021060Z =========================== short test summary info ============================ 2025-12-04T09:13:23.7022083Z FAILED [9.2338s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:13:23.7023050Z Traceback (most recent call last): 2025-12-04T09:13:23.7023854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7024668Z getattr(self, test_name)() 2025-12-04T09:13:23.7025414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7026195Z fn() 2025-12-04T09:13:23.7026850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7027598Z method(*args, **kwargs) 2025-12-04T09:13:23.7028313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7029078Z method(*args, **kwargs) 2025-12-04T09:13:23.7029787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7030537Z with policy(): 2025-12-04T09:13:23.7031222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7031994Z raise RuntimeError(msg) 2025-12-04T09:13:23.7033401Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 604962816 and is now 640614400. 2025-12-04T09:13:23.7034442Z 2025-12-04T09:13:23.7034635Z To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7035416Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.7036013Z 2025-12-04T09:13:23.7036255Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7036789Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:13:23.7037231Z ======================= 1 failed, 1 deselected in 9.45s ======================== 2025-12-04T09:13:23.7037615Z Got exit code 1 2025-12-04T09:13:23.7037863Z Retrying single test... 2025-12-04T09:13:23.7038623Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-9a300aee582fd0b6.xml 2025-12-04T09:13:23.7039493Z ============================= test session starts ============================== 2025-12-04T09:13:23.7040084Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:13:23.7040624Z cachedir: .pytest_cache 2025-12-04T09:13:23.7041240Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:13:23.7041938Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:13:23.7042251Z configfile: pytest.ini 2025-12-04T09:13:23.7042887Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:13:23.7043771Z collecting ... collected 2 items / 1 deselected / 1 selected 2025-12-04T09:13:23.7044624Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda 2025-12-04T09:13:23.7045391Z Running 1 items in this shard 2025-12-04T09:13:23.7045579Z 2025-12-04T09:13:23.7046385Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda I1204 09:12:28.004000 15470 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 15522 2025-12-04T09:13:23.7047751Z I1204 09:12:28.005000 15470 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 15523 2025-12-04T09:13:23.7048777Z I1204 09:12:28.006000 15470 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 15524 2025-12-04T09:13:23.7049786Z I1204 09:12:28.006000 15470 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 15525 2025-12-04T09:13:23.7051899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:13:23.7053684Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:13:23.7055481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:13:23.7057576Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:13:23.7059679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:13:23.7061705Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:13:23.7063718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:13:23.7065720Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:13:23.7067035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:13:23.7068293Z return func(*args, **kwargs) 2025-12-04T09:13:23.7069085Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7070207Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7071707Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7073179Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7074702Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7076075Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7077409Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7078823Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7080242Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7081663Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7083081Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7084456Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7085841Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7087272Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7089224Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 720306176 and is now 749666304. 2025-12-04T09:13:23.7091000Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7092040Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7093614Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.7094936Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7096034Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7097568Z [rank0]:E1204 09:12:34.860000 15522 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:13:23.7098716Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7099851Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7101533Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7103304Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7104949Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7106486Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7108004Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7109666Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7111084Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7112501Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7113925Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7115306Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7116874Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7118395Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7120451Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 607059968 and is now 640614400. 2025-12-04T09:13:23.7122850Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7124036Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7125824Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.7127324Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7128543Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7129954Z [rank1]:E1204 09:12:34.861000 15523 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:13:23.7131101Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7132239Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7134109Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7135691Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7137458Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7139004Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7140520Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7142108Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7143723Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7145327Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7146933Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7148499Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7150049Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7151475Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7153446Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 607059968 and is now 640614400. 2025-12-04T09:13:23.7155217Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7156266Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7157834Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.7159153Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7160248Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7161513Z [rank3]:E1204 09:12:34.862000 15525 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:13:23.7162524Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7163534Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7165081Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7166555Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7168020Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7169372Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7170716Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7172140Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7173562Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7174966Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7176390Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7178131Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7179703Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7181396Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7183521Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 604962816 and is now 640614400. 2025-12-04T09:13:23.7185519Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7186697Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7188501Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.7190090Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7191176Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7192433Z [rank2]:E1204 09:12:34.863000 15524 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:13:23.7193148Z dist init r=3, world=4 2025-12-04T09:13:23.7193412Z dist init r=1, world=4 2025-12-04T09:13:23.7193712Z dist init r=2, world=4 2025-12-04T09:13:23.7206928Z dist init r=0, world=4 2025-12-04T09:13:23.7208263Z [rank0]:[W1204 09:12:35.871466517 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:13:23.7209605Z FAILED [8.5903s] [100%] 2025-12-04T09:13:23.7209794Z 2025-12-04T09:13:23.7209941Z =================================== FAILURES =================================== 2025-12-04T09:13:23.7210469Z ____________________ TestPureFP16CUDA.test_fp16_dtypes_cuda ____________________ 2025-12-04T09:13:23.7210947Z Traceback (most recent call last): 2025-12-04T09:13:23.7211708Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:13:23.7212475Z self._join_processes(fn) 2025-12-04T09:13:23.7213235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:13:23.7214065Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:13:23.7214907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:13:23.7215735Z raise RuntimeError(error) 2025-12-04T09:13:23.7216152Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:13:23.7216732Z Traceback (most recent call last): 2025-12-04T09:13:23.7217689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7218496Z getattr(self, test_name)() 2025-12-04T09:13:23.7219249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7220039Z fn() 2025-12-04T09:13:23.7220705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7221670Z method(*args, **kwargs) 2025-12-04T09:13:23.7222567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7223333Z method(*args, **kwargs) 2025-12-04T09:13:23.7224047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7224791Z with policy(): 2025-12-04T09:13:23.7225477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7226252Z raise RuntimeError(msg) 2025-12-04T09:13:23.7227488Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 607059968 and is now 640614400. 2025-12-04T09:13:23.7228679Z 2025-12-04T09:13:23.7228895Z To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7229787Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.7230447Z 2025-12-04T09:13:23.7230729Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7231135Z 2025-12-04T09:13:23.7231140Z 2025-12-04T09:13:23.7231380Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:13:23.7231998Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:13:23.7233447Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-9a300aee582fd0b6.xml - 2025-12-04T09:13:23.7234487Z =========================== short test summary info ============================ 2025-12-04T09:13:23.7235496Z FAILED [8.5903s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:13:23.7236336Z Traceback (most recent call last): 2025-12-04T09:13:23.7237053Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7237776Z getattr(self, test_name)() 2025-12-04T09:13:23.7238441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7239139Z fn() 2025-12-04T09:13:23.7239723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7240404Z method(*args, **kwargs) 2025-12-04T09:13:23.7241033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7241712Z method(*args, **kwargs) 2025-12-04T09:13:23.7242347Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7243025Z with policy(): 2025-12-04T09:13:23.7243624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7244316Z raise RuntimeError(msg) 2025-12-04T09:13:23.7245433Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_fp16_dtypes_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 607059968 and is now 640614400. 2025-12-04T09:13:23.7246471Z 2025-12-04T09:13:23.7246678Z To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7247450Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_fp16_dtypes_cuda 2025-12-04T09:13:23.7248052Z 2025-12-04T09:13:23.7248291Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7248827Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:13:23.7249327Z ======================= 1 failed, 1 deselected in 8.80s ======================== 2025-12-04T09:13:23.7249714Z Got exit code 1 2025-12-04T09:13:23.7250266Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda 2025-12-04T09:13:23.7251154Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:13:23.7252235Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-433868368b6a29b3.xml 2025-12-04T09:13:23.7253109Z ============================= test session starts ============================== 2025-12-04T09:13:23.7253711Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:13:23.7254429Z cachedir: .pytest_cache 2025-12-04T09:13:23.7255094Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:13:23.7256020Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:13:23.7256369Z configfile: pytest.ini 2025-12-04T09:13:23.7257326Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:13:23.7258221Z collecting ... collected 2 items / 1 deselected / 1 selected 2025-12-04T09:13:23.7258712Z stepcurrent: skipping 1 already run items. 2025-12-04T09:13:23.7259095Z Running 1 items in this shard 2025-12-04T09:13:23.7259306Z 2025-12-04T09:13:23.7260252Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda I1204 09:12:41.514000 15807 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 15859 2025-12-04T09:13:23.7261911Z I1204 09:12:41.515000 15807 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 15860 2025-12-04T09:13:23.7263040Z I1204 09:12:41.516000 15807 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 15861 2025-12-04T09:13:23.7264167Z I1204 09:12:41.516000 15807 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 15862 2025-12-04T09:13:23.7265813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:13:23.7267069Z return func(*args, **kwargs) 2025-12-04T09:13:23.7267736Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7269162Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7270663Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7272128Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7273593Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7274955Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7276305Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7277784Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7279209Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7280638Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7282048Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7283428Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7284987Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7286498Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7288541Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:13:23.7290500Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7291608Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7293328Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7294754Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7295913Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7297495Z [rank0]:E1204 09:12:48.365000 15859 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:13:23.7298660Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7299807Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7301495Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7303138Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7304793Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7306337Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7307919Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7309675Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7311083Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7312502Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7313923Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7315317Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7316706Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7318121Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7320051Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 609157120 and is now 630128640. 2025-12-04T09:13:23.7322492Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7323676Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7325489Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7326996Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7328226Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7329644Z [rank1]:E1204 09:12:48.365000 15860 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:13:23.7330791Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7331913Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7333691Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7335294Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7336980Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7338527Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7340155Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7341766Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7343371Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7344971Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7346581Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7348133Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7349835Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7351264Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7353184Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:13:23.7355059Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7356091Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7357702Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7359044Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7360150Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7361398Z [rank2]:E1204 09:12:48.366000 15861 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:13:23.7362430Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7363437Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7364931Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7366405Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7367927Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7369294Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7370636Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7372050Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7373470Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7374885Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7376320Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7378080Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7379650Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7381247Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7383485Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 611254272 and is now 630128640. 2025-12-04T09:13:23.7385517Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7386692Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7388476Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7389984Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7391067Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7392301Z [rank3]:E1204 09:12:48.366000 15862 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:13:23.7392985Z dist init r=2, world=4 2025-12-04T09:13:23.7393229Z dist init r=0, world=4 2025-12-04T09:13:23.7393468Z dist init r=1, world=4 2025-12-04T09:13:23.7393708Z dist init r=3, world=4 2025-12-04T09:13:23.7394882Z [rank0]:[W1204 09:12:48.383065767 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:13:23.7396115Z FAILED [9.2059s] [100%] 2025-12-04T09:13:23.7396266Z 2025-12-04T09:13:23.7396404Z =================================== FAILURES =================================== 2025-12-04T09:13:23.7396884Z ________________ TestPureFP16CUDA.test_pure_fp16_training_cuda _________________ 2025-12-04T09:13:23.7397396Z Traceback (most recent call last): 2025-12-04T09:13:23.7398090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:13:23.7398792Z self._join_processes(fn) 2025-12-04T09:13:23.7399484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:13:23.7400252Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:13:23.7401029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:13:23.7401801Z raise RuntimeError(error) 2025-12-04T09:13:23.7402182Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:13:23.7402613Z Traceback (most recent call last): 2025-12-04T09:13:23.7403306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7404000Z getattr(self, test_name)() 2025-12-04T09:13:23.7404660Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7405335Z fn() 2025-12-04T09:13:23.7405902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7406558Z method(*args, **kwargs) 2025-12-04T09:13:23.7407177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7407917Z method(*args, **kwargs) 2025-12-04T09:13:23.7408529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7409184Z with policy(): 2025-12-04T09:13:23.7409783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7410456Z raise RuntimeError(msg) 2025-12-04T09:13:23.7411572Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 609157120 and is now 630128640. 2025-12-04T09:13:23.7412641Z 2025-12-04T09:13:23.7412828Z To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7413626Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7414235Z 2025-12-04T09:13:23.7414476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7414829Z 2025-12-04T09:13:23.7414833Z 2025-12-04T09:13:23.7415026Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:13:23.7415569Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:13:23.7416736Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-433868368b6a29b3.xml - 2025-12-04T09:13:23.7418034Z =========================== short test summary info ============================ 2025-12-04T09:13:23.7419065Z FAILED [9.2059s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:13:23.7420038Z Traceback (most recent call last): 2025-12-04T09:13:23.7420987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7421779Z getattr(self, test_name)() 2025-12-04T09:13:23.7422626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7423393Z fn() 2025-12-04T09:13:23.7424038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7424796Z method(*args, **kwargs) 2025-12-04T09:13:23.7425496Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7426250Z method(*args, **kwargs) 2025-12-04T09:13:23.7426955Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7427690Z with policy(): 2025-12-04T09:13:23.7428367Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7429117Z raise RuntimeError(msg) 2025-12-04T09:13:23.7430399Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 609157120 and is now 630128640. 2025-12-04T09:13:23.7431609Z 2025-12-04T09:13:23.7431820Z To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7432821Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7433549Z 2025-12-04T09:13:23.7433781Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7434298Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:13:23.7435835Z ======================= 1 failed, 1 deselected in 9.41s ======================== 2025-12-04T09:13:23.7436202Z Got exit code 1 2025-12-04T09:13:23.7436431Z Retrying single test... 2025-12-04T09:13:23.7437179Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-cb48c540b8fb2acf.xml 2025-12-04T09:13:23.7438028Z ============================= test session starts ============================== 2025-12-04T09:13:23.7438602Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:13:23.7439125Z cachedir: .pytest_cache 2025-12-04T09:13:23.7439732Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:13:23.7440415Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:13:23.7440724Z configfile: pytest.ini 2025-12-04T09:13:23.7441348Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:13:23.7442125Z collecting ... collected 2 items / 1 deselected / 1 selected 2025-12-04T09:13:23.7443070Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda 2025-12-04T09:13:23.7443852Z Running 1 items in this shard 2025-12-04T09:13:23.7444039Z 2025-12-04T09:13:23.7445325Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda I1204 09:12:55.504000 16144 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 16196 2025-12-04T09:13:23.7446799Z I1204 09:12:55.505000 16144 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 16197 2025-12-04T09:13:23.7447853Z I1204 09:12:55.506000 16144 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 16198 2025-12-04T09:13:23.7448931Z I1204 09:12:55.506000 16144 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 16199 2025-12-04T09:13:23.7450540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:13:23.7451701Z return func(*args, **kwargs) 2025-12-04T09:13:23.7452325Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7453381Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7454959Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7456785Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7458589Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7460117Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7461618Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7463208Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7464784Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7466433Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7468018Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7469666Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7471245Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7472735Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7474771Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 718209024 and is now 739180544. 2025-12-04T09:13:23.7476664Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7477759Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7479454Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7480861Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7482078Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7483390Z [rank0]:E1204 09:13:02.374000 16196 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:13:23.7484450Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7485498Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7487071Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7488844Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7490320Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7491691Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7493025Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7494451Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7495950Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7497690Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7499296Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7500847Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7502411Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7504025Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7506199Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 611254272 and is now 630128640. 2025-12-04T09:13:23.7508214Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7509552Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7511170Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7512562Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7513657Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7514897Z [rank1]:E1204 09:13:02.374000 16197 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:13:23.7515916Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7516921Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7518422Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7519884Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7521685Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7523221Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7524738Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7526457Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7528058Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7529646Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7531239Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7532796Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7534380Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7535804Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7538084Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:13:23.7540119Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7541306Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7543269Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7544798Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7546026Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7547443Z [rank2]:E1204 09:13:02.374000 16198 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:13:23.7548599Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7549788Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7551371Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7552931Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7554676Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7556161Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7557683Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7559222Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7560844Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7562349Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7563850Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7565402Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7566778Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7568208Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7570127Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:13:23.7571923Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7573018Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7574613Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7576147Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7577542Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7578955Z [rank3]:E1204 09:13:02.374000 16199 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:13:23.7579741Z dist init r=3, world=4 2025-12-04T09:13:23.7580031Z dist init r=1, world=4 2025-12-04T09:13:23.7580311Z dist init r=0, world=4 2025-12-04T09:13:23.7580578Z dist init r=2, world=4 2025-12-04T09:13:23.7581918Z [rank0]:[W1204 09:13:02.390444974 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:13:23.7583312Z FAILED [9.4457s] [100%] 2025-12-04T09:13:23.7583493Z 2025-12-04T09:13:23.7583656Z =================================== FAILURES =================================== 2025-12-04T09:13:23.7584208Z ________________ TestPureFP16CUDA.test_pure_fp16_training_cuda _________________ 2025-12-04T09:13:23.7584743Z Traceback (most recent call last): 2025-12-04T09:13:23.7585537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:13:23.7586441Z self._join_processes(fn) 2025-12-04T09:13:23.7587233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:13:23.7588113Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:13:23.7589111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:13:23.7590128Z raise RuntimeError(error) 2025-12-04T09:13:23.7590532Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:13:23.7590976Z Traceback (most recent call last): 2025-12-04T09:13:23.7591678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7592377Z getattr(self, test_name)() 2025-12-04T09:13:23.7593055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7593743Z fn() 2025-12-04T09:13:23.7594307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7594977Z method(*args, **kwargs) 2025-12-04T09:13:23.7595600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7596270Z method(*args, **kwargs) 2025-12-04T09:13:23.7596888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7597544Z with policy(): 2025-12-04T09:13:23.7598146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7598825Z raise RuntimeError(msg) 2025-12-04T09:13:23.7599949Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 611254272 and is now 630128640. 2025-12-04T09:13:23.7601016Z 2025-12-04T09:13:23.7601259Z To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7602063Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7602673Z 2025-12-04T09:13:23.7602915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7603270Z 2025-12-04T09:13:23.7603412Z Process 3 exited with error code 10 and exception: 2025-12-04T09:13:23.7603774Z Traceback (most recent call last): 2025-12-04T09:13:23.7604470Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7605178Z getattr(self, test_name)() 2025-12-04T09:13:23.7605830Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7606504Z fn() 2025-12-04T09:13:23.7607074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7607729Z method(*args, **kwargs) 2025-12-04T09:13:23.7608353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7609021Z method(*args, **kwargs) 2025-12-04T09:13:23.7609640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7610288Z with policy(): 2025-12-04T09:13:23.7610888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7611614Z raise RuntimeError(msg) 2025-12-04T09:13:23.7612737Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:13:23.7613802Z 2025-12-04T09:13:23.7613988Z To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7614785Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7615395Z 2025-12-04T09:13:23.7615642Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7615991Z 2025-12-04T09:13:23.7615995Z 2025-12-04T09:13:23.7616195Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:13:23.7616816Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:13:23.7618235Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-cb48c540b8fb2acf.xml - 2025-12-04T09:13:23.7619396Z =========================== short test summary info ============================ 2025-12-04T09:13:23.7620484Z FAILED [9.4457s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:13:23.7621634Z Traceback (most recent call last): 2025-12-04T09:13:23.7622418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7623214Z getattr(self, test_name)() 2025-12-04T09:13:23.7623950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7624717Z fn() 2025-12-04T09:13:23.7625354Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7626101Z method(*args, **kwargs) 2025-12-04T09:13:23.7626907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7627655Z method(*args, **kwargs) 2025-12-04T09:13:23.7628363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7629100Z with policy(): 2025-12-04T09:13:23.7629778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7630538Z raise RuntimeError(msg) 2025-12-04T09:13:23.7631809Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 611254272 and is now 630128640. 2025-12-04T09:13:23.7633111Z 2025-12-04T09:13:23.7633320Z To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7634163Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7634819Z 2025-12-04T09:13:23.7635067Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7635441Z 2025-12-04T09:13:23.7635599Z Process 3 exited with error code 10 and exception: 2025-12-04T09:13:23.7635976Z Traceback (most recent call last): 2025-12-04T09:13:23.7636710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7637533Z getattr(self, test_name)() 2025-12-04T09:13:23.7638196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7638934Z fn() 2025-12-04T09:13:23.7639507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7640180Z method(*args, **kwargs) 2025-12-04T09:13:23.7640811Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7641471Z method(*args, **kwargs) 2025-12-04T09:13:23.7642091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7642754Z with policy(): 2025-12-04T09:13:23.7643344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7644017Z raise RuntimeError(msg) 2025-12-04T09:13:23.7645145Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:13:23.7646204Z 2025-12-04T09:13:23.7646411Z To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7647204Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7647820Z 2025-12-04T09:13:23.7648053Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7648570Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:13:23.7649012Z ======================= 1 failed, 1 deselected in 9.66s ======================== 2025-12-04T09:13:23.7649370Z Got exit code 1 2025-12-04T09:13:23.7649601Z Retrying single test... 2025-12-04T09:13:23.7650356Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-f306b72badd85355.xml 2025-12-04T09:13:23.7651211Z ============================= test session starts ============================== 2025-12-04T09:13:23.7651835Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:13:23.7652358Z cachedir: .pytest_cache 2025-12-04T09:13:23.7652978Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:13:23.7653656Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:13:23.7653963Z configfile: pytest.ini 2025-12-04T09:13:23.7654599Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:13:23.7655370Z collecting ... collected 2 items / 1 deselected / 1 selected 2025-12-04T09:13:23.7656228Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda 2025-12-04T09:13:23.7657281Z Running 1 items in this shard 2025-12-04T09:13:23.7657487Z 2025-12-04T09:13:23.7658429Z distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda I1204 09:13:09.424000 16481 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 16533 2025-12-04T09:13:23.7659984Z I1204 09:13:09.425000 16481 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 16534 2025-12-04T09:13:23.7661107Z I1204 09:13:09.426000 16481 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 16535 2025-12-04T09:13:23.7662230Z I1204 09:13:09.427000 16481 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 16536 2025-12-04T09:13:23.7663874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:13:23.7665181Z return func(*args, **kwargs) 2025-12-04T09:13:23.7665840Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7666967Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7668642Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7670358Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7671809Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7673167Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7674516Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7675927Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7677343Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7678738Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7680196Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7681573Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7682952Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7684373Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7686274Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 0. CUDA driver allocated memory was 716111872 and is now 739180544. 2025-12-04T09:13:23.7688079Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7689115Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7690721Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7692052Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7693202Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7694452Z [rank0]:E1204 09:13:16.385000 16533 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:13:23.7695461Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7696521Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7698319Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7699976Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7701618Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7703142Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7704649Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7706228Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7707820Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7709581Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7711034Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7712417Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7713785Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7715199Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7717118Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 2. CUDA driver allocated memory was 611254272 and is now 630128640. 2025-12-04T09:13:23.7718902Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7719935Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7721886Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7723487Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7724713Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7726115Z [rank2]:E1204 09:13:16.385000 16535 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:13:23.7727251Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7728371Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7730047Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7731697Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7733418Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7734765Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7736104Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7737842Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7739434Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7741103Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7742683Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7744478Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7746031Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7747638Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7749834Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 1. CUDA driver allocated memory was 609157120 and is now 630128640. 2025-12-04T09:13:23.7751616Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7752854Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7754746Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7756268Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7757456Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7758799Z [rank1]:E1204 09:13:16.385000 16534 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:13:23.7759895Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:13:23.7760976Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:13:23.7762616Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7764199Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:13:23.7765782Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7767263Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:13:23.7768800Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7770331Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7772000Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7773499Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:13:23.7774986Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7776508Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:13:23.7778222Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7779824Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:13:23.7781985Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 3. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:13:23.7784297Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7785464Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7787356Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7788948Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:13:23.7790092Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7791403Z [rank3]:E1204 09:13:16.385000 16536 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:13:23.7792146Z dist init r=3, world=4 2025-12-04T09:13:23.7792397Z dist init r=2, world=4 2025-12-04T09:13:23.7792652Z dist init r=0, world=4 2025-12-04T09:13:23.7792908Z dist init r=1, world=4 2025-12-04T09:13:23.7794146Z [rank0]:[W1204 09:13:16.403617794 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:13:23.7795461Z FAILED [9.2136s] [100%] 2025-12-04T09:13:23.7795633Z 2025-12-04T09:13:23.7795769Z =================================== FAILURES =================================== 2025-12-04T09:13:23.7796383Z ________________ TestPureFP16CUDA.test_pure_fp16_training_cuda _________________ 2025-12-04T09:13:23.7796836Z Traceback (most recent call last): 2025-12-04T09:13:23.7797544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:13:23.7798260Z self._join_processes(fn) 2025-12-04T09:13:23.7798981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:13:23.7799753Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:13:23.7800534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:13:23.7801388Z raise RuntimeError(error) 2025-12-04T09:13:23.7801773Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:13:23.7802199Z Traceback (most recent call last): 2025-12-04T09:13:23.7802883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7803584Z getattr(self, test_name)() 2025-12-04T09:13:23.7804241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7804920Z fn() 2025-12-04T09:13:23.7805484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7806146Z method(*args, **kwargs) 2025-12-04T09:13:23.7806771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7807437Z method(*args, **kwargs) 2025-12-04T09:13:23.7808062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7808714Z with policy(): 2025-12-04T09:13:23.7809317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7809984Z raise RuntimeError(msg) 2025-12-04T09:13:23.7811104Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 2. CUDA driver allocated memory was 611254272 and is now 630128640. 2025-12-04T09:13:23.7812217Z 2025-12-04T09:13:23.7812403Z To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7813202Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7813830Z 2025-12-04T09:13:23.7814064Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7814420Z 2025-12-04T09:13:23.7814424Z 2025-12-04T09:13:23.7814631Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:13:23.7815171Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:13:23.7816280Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-f306b72badd85355.xml - 2025-12-04T09:13:23.7817620Z =========================== short test summary info ============================ 2025-12-04T09:13:23.7818664Z FAILED [9.2136s] distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:13:23.7819632Z Traceback (most recent call last): 2025-12-04T09:13:23.7820416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:13:23.7821402Z getattr(self, test_name)() 2025-12-04T09:13:23.7822149Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:13:23.7822901Z fn() 2025-12-04T09:13:23.7823532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7824271Z method(*args, **kwargs) 2025-12-04T09:13:23.7824966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:13:23.7825721Z method(*args, **kwargs) 2025-12-04T09:13:23.7826428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:13:23.7827172Z with policy(): 2025-12-04T09:13:23.7827965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:13:23.7828727Z raise RuntimeError(msg) 2025-12-04T09:13:23.7830000Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestPureFP16CUDA.test_pure_fp16_training_cuda! Caching allocator allocated memory was 512 and is now reported as 4608 on device 2. CUDA driver allocated memory was 611254272 and is now 630128640. 2025-12-04T09:13:23.7831198Z 2025-12-04T09:13:23.7831423Z To execute this test, run the following from the base repo dir: 2025-12-04T09:13:23.7832321Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_pure_fp16.py TestPureFP16CUDA.test_pure_fp16_training_cuda 2025-12-04T09:13:23.7833116Z 2025-12-04T09:13:23.7833365Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:13:23.7833918Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:13:23.7834389Z ======================= 1 failed, 1 deselected in 9.43s ======================== 2025-12-04T09:13:23.7834772Z Got exit code 1 2025-12-04T09:13:23.7835371Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda 2025-12-04T09:13:23.7836334Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:13:23.7837466Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-456a3faf0e1ca4c4.xml 2025-12-04T09:13:23.7838561Z ============================= test session starts ============================== 2025-12-04T09:13:23.7838986Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:13:23.7839091Z cachedir: .pytest_cache 2025-12-04T09:13:23.7839598Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:13:23.7839731Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:13:23.7839834Z configfile: pytest.ini 2025-12-04T09:13:23.7840358Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:13:23.7840571Z collecting ... collected 2 items / 2 deselected / 0 selected 2025-12-04T09:13:23.7840709Z stepcurrent: skipping 2 already run items. 2025-12-04T09:13:23.7840831Z Running 0 items in this shard 2025-12-04T09:13:23.7840837Z 2025-12-04T09:13:23.7841756Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-456a3faf0e1ca4c4.xml - 2025-12-04T09:13:23.7841920Z ============================ 2 deselected in 0.01s ============================= 2025-12-04T09:13:23.7842887Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_fp16_dtypes_cuda', 'test/distributed/fsdp/test_fsdp_pure_fp16.py::TestPureFP16CUDA::test_pure_fp16_training_cuda'] 2025-12-04T09:13:23.7842892Z 2025-12-04T09:13:23.7843503Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_pure_fp16 1/1 (test/test-reports/distributed.fsdp.test_fsdp_pure_fp16_1.1_2de43ef0fea2c555_.log) 2025-12-04T09:13:23.7843508Z 2025-12-04T09:13:23.7843891Z Finished distributed/fsdp/test_fsdp_pure_fp16 1/1 ... [2025-12-04 09:13:23.663258][1235.271173761], took 1.44min 2025-12-04T09:13:23.7844740Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-e1278d34de852f2a.xml 2025-12-04T09:13:23.7845615Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-efcb608498b7750d.xml 2025-12-04T09:13:23.7846509Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-9a300aee582fd0b6.xml 2025-12-04T09:13:23.8095848Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-433868368b6a29b3.xml 2025-12-04T09:13:23.8427230Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-cb48c540b8fb2acf.xml 2025-12-04T09:13:23.8751958Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-f306b72badd85355.xml 2025-12-04T09:13:23.9072346Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-456a3faf0e1ca4c4.xml 2025-12-04T09:13:24.1178986Z Uploading logs for 57116084904 to S3 2025-12-04T09:13:24.1539342Z Uploading artifacts took 0.22 seconds 2025-12-04T09:13:24.1539803Z distributed/fsdp/test_fsdp_pure_fp16 1/1 failed! 2025-12-04T09:13:24.1543577Z Running distributed/tensor/debug/test_debug_mode 1/1 ... [2025-12-04 09:13:24.154185][1235.762101353] 2025-12-04T09:13:24.1544206Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:13:24.1546832Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/debug/test_debug_mode.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:13:24.154508] 2025-12-04T09:14:13.7418881Z 2025-12-04T09:14:13.7420103Z distributed/tensor/debug/test_debug_mode 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.debug.test_debug_mode_1.1_8a4ec9b51bad1d98_.log 2025-12-04T09:14:13.7435333Z Running 25 items in this shard: test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_check_hash_mismatches, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_check_structure_mismatches, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_check_triton_hash_mismatches, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_compile, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_mode_backward, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_mode_densor_redistribution_trace, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_mode_einsum, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_mode_higher_order_cond, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_mode_mm, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_debug_string_inside_context, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_fake_tensor, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_nested_debug_mode_has_inner_mode_False_has_outer_mode_False, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_nested_debug_mode_has_inner_mode_False_has_outer_mode_True, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_nested_debug_mode_has_inner_mode_True_has_outer_mode_False, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_nested_debug_mode_has_inner_mode_True_has_outer_mode_True, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_nn_module, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_pretty_print_dtensor_make_fx, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_real_tensor, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_tensor_attributes, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_tensor_hash_redistribute, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugMode::test_triton_kernel_logs, test/distributed/tensor/debug/test_debug_mode.py::TestDebugModeUtils::test_hash_empty_tenor, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugModeNCCLBackend::test_allgather_base, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugModeNCCLBackend::test_allgather_base_async_op, test/distributed/tensor/debug/test_debug_mode.py::TestDTensorDebugModeNCCLBackend::test_allgather_functional_with_async_collective_tensor 2025-12-04T09:14:13.7449218Z 2025-12-04T09:14:13.7449651Z Finished distributed/tensor/debug/test_debug_mode 1/1 ... [2025-12-04 09:14:13.741471][1285.349382303], took 0.83min 2025-12-04T09:14:13.7451101Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.debug.test_debug_mode/distributed.tensor.debug.test_debug_mode-21dd2989918f2f32.xml 2025-12-04T09:14:13.8339790Z Running distributed/fsdp/test_fsdp_exec_order 1/1 ... [2025-12-04 09:14:13.833741][1285.441658776] 2025-12-04T09:14:13.8340444Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:14:13.8342867Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_exec_order.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:14:13.834099] 2025-12-04T09:19:34.5390824Z 2025-12-04T09:19:34.5391845Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_exec_order 1/1 (test/test-reports/distributed.fsdp.test_fsdp_exec_order_1.1_a2a67ccbd845e856_.log) 2025-12-04T09:19:34.5394296Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-93c7f0a0a61745d5.xml 2025-12-04T09:19:34.5395499Z ============================= test session starts ============================== 2025-12-04T09:19:34.5396181Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.5396765Z cachedir: .pytest_cache 2025-12-04T09:19:34.5397459Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.5398234Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.5398761Z configfile: pytest.ini 2025-12-04T09:19:34.5399493Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.5400298Z collecting ... collected 8 items 2025-12-04T09:19:34.5400727Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T09:19:34.5406720Z Running 8 items in this shard: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda, test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.5412471Z 2025-12-04T09:19:34.5415236Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda I1204 09:14:17.224000 19437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 19489 2025-12-04T09:19:34.5417173Z I1204 09:14:17.224000 19437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 19490 2025-12-04T09:19:34.5418326Z I1204 09:14:17.225000 19437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 19491 2025-12-04T09:19:34.5419468Z I1204 09:14:17.226000 19437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 19492 2025-12-04T09:19:34.5422071Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.5424126Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.5426159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.5428192Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.5448464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.5450858Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.5452899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.5454939Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.5455719Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.5456992Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.5458686Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5460330Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.5461980Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5463524Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.5465044Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5466773Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5468479Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5470035Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5471588Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5473106Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.5474632Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5476185Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.5478444Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 728694784. 2025-12-04T09:19:34.5480621Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5481769Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5483705Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5485321Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5486525Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5487902Z [rank0]:E1204 09:14:23.872000 19489 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.5489017Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.5490108Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.5491744Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5493352Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.5494958Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5496544Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.5498296Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5499904Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5501510Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5503118Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5504725Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5506288Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.5507860Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5509537Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.5511798Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 619642880. 2025-12-04T09:19:34.5513982Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5515115Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5517047Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5518687Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5519891Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5521649Z [rank1]:E1204 09:14:23.872000 19490 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.5522792Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.5523930Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.5525626Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5527284Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.5528929Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5530567Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.5532091Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5533871Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5535381Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5537138Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5538750Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5540305Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.5541868Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5543481Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.5545903Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 491716608 and is now 619642880. 2025-12-04T09:19:34.5548096Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5549439Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5551206Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5552715Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5553809Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5555063Z [rank3]:E1204 09:14:23.872000 19492 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.5556090Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.5557104Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.5558824Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5560383Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.5562034Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5563496Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.5564930Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5566426Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5567955Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5569639Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5571192Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5572692Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.5574203Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5575818Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.5578365Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 619642880. 2025-12-04T09:19:34.5580553Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5581727Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5583709Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5585408Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5586638Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5588052Z [rank2]:E1204 09:14:23.873000 19491 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.5589047Z dist init r=3, world=4 2025-12-04T09:19:34.5589309Z dist init r=0, world=4 2025-12-04T09:19:34.5589566Z dist init r=2, world=4 2025-12-04T09:19:34.5589810Z dist init r=1, world=4 2025-12-04T09:19:34.5590069Z FAILED [8.3750s] [ 12%] 2025-12-04T09:19:34.5590241Z 2025-12-04T09:19:34.5590380Z =================================== FAILURES =================================== 2025-12-04T09:19:34.5590933Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda __ 2025-12-04T09:19:34.5591499Z Traceback (most recent call last): 2025-12-04T09:19:34.5592207Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.5592924Z self._join_processes(fn) 2025-12-04T09:19:34.5593632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.5594418Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.5595210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.5595994Z raise RuntimeError(error) 2025-12-04T09:19:34.5596395Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.5596843Z Traceback (most recent call last): 2025-12-04T09:19:34.5597554Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5598252Z getattr(self, test_name)() 2025-12-04T09:19:34.5598931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5599622Z fn() 2025-12-04T09:19:34.5600202Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5600870Z method(*args, **kwargs) 2025-12-04T09:19:34.5601512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5602183Z method(*args, **kwargs) 2025-12-04T09:19:34.5602874Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5603528Z with policy(): 2025-12-04T09:19:34.5604144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5604837Z raise RuntimeError(msg) 2025-12-04T09:19:34.5606110Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 619642880. 2025-12-04T09:19:34.5607333Z 2025-12-04T09:19:34.5607530Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5608494Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5609270Z 2025-12-04T09:19:34.5609525Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5610058Z 2025-12-04T09:19:34.5610062Z 2025-12-04T09:19:34.5610298Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.5610885Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.5612081Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-93c7f0a0a61745d5.xml - 2025-12-04T09:19:34.5613187Z =========================== short test summary info ============================ 2025-12-04T09:19:34.5614357Z FAILED [8.3750s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.5615420Z Traceback (most recent call last): 2025-12-04T09:19:34.5616174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5617198Z getattr(self, test_name)() 2025-12-04T09:19:34.5618031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5618796Z fn() 2025-12-04T09:19:34.5619450Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5620209Z method(*args, **kwargs) 2025-12-04T09:19:34.5621090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5621858Z method(*args, **kwargs) 2025-12-04T09:19:34.5622577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5623343Z with policy(): 2025-12-04T09:19:34.5624017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5624787Z raise RuntimeError(msg) 2025-12-04T09:19:34.5626242Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 619642880. 2025-12-04T09:19:34.5627609Z 2025-12-04T09:19:34.5627842Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5628920Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5629792Z 2025-12-04T09:19:34.5630058Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5630765Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.5631248Z ============================== 1 failed in 8.40s =============================== 2025-12-04T09:19:34.5631637Z Got exit code 1 2025-12-04T09:19:34.5631914Z Retrying single test... 2025-12-04T09:19:34.5632905Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-50fd36707db41f77.xml 2025-12-04T09:19:34.5633864Z ============================= test session starts ============================== 2025-12-04T09:19:34.5634516Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.5635100Z cachedir: .pytest_cache 2025-12-04T09:19:34.5635795Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.5636542Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.5636889Z configfile: pytest.ini 2025-12-04T09:19:34.5637594Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.5638441Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.5639568Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5640593Z Running 1 items in this shard 2025-12-04T09:19:34.5640798Z 2025-12-04T09:19:34.5641881Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda I1204 09:14:30.304000 19750 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 19802 2025-12-04T09:19:34.5643559Z I1204 09:14:30.305000 19750 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 19803 2025-12-04T09:19:34.5644743Z I1204 09:14:30.305000 19750 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 19804 2025-12-04T09:19:34.5645811Z I1204 09:14:30.306000 19750 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 19805 2025-12-04T09:19:34.5648132Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.5650031Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.5651931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.5653838Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.5655770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.5657922Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.5659926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.5662012Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.5662789Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.5663936Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.5665627Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5667267Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.5669027Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5670484Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.5671912Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5673424Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5674921Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5676445Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5678008Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5679512Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.5680905Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5682319Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.5684399Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 728694784. 2025-12-04T09:19:34.5686329Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5687369Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5689130Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5690664Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5691765Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5693020Z [rank0]:E1204 09:14:36.831000 19802 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.5694040Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.5695034Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.5696770Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5698442Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.5700107Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5701645Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.5703149Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5704748Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5706358Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5708014Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5709720Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5711079Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.5712455Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5713866Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.5715911Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 607059968 and is now 619642880. 2025-12-04T09:19:34.5717837Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5718855Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5720651Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5722582Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5723796Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5725189Z [rank3]:E1204 09:14:36.832000 19805 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.5726310Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.5727422Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.5729099Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5730746Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.5732372Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5733951Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.5735289Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5736940Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5738669Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5740252Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5741840Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5743388Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.5744947Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5746552Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.5749074Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 609157120 and is now 619642880. 2025-12-04T09:19:34.5750994Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5752089Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5753839Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5755321Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5756390Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5757623Z [rank1]:E1204 09:14:36.832000 19803 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.5758629Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.5759617Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.5761098Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5762538Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.5763983Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5765333Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.5766723Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5768121Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5769517Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5770909Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5772310Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5773676Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.5775037Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5776523Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.5778988Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 607059968 and is now 619642880. 2025-12-04T09:19:34.5781222Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5782376Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5784333Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5785996Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5787210Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5788709Z [rank2]:E1204 09:14:36.832000 19804 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.5789525Z dist init r=0, world=4 2025-12-04T09:19:34.5789764Z dist init r=2, world=4 2025-12-04T09:19:34.5789999Z dist init r=3, world=4 2025-12-04T09:19:34.5790228Z dist init r=1, world=4 2025-12-04T09:19:34.5790450Z FAILED [8.2962s] [100%] 2025-12-04T09:19:34.5790604Z 2025-12-04T09:19:34.5790737Z =================================== FAILURES =================================== 2025-12-04T09:19:34.5791272Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda __ 2025-12-04T09:19:34.5791763Z Traceback (most recent call last): 2025-12-04T09:19:34.5792452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.5793147Z self._join_processes(fn) 2025-12-04T09:19:34.5793852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.5794603Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.5795436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.5796191Z raise RuntimeError(error) 2025-12-04T09:19:34.5796574Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.5796992Z Traceback (most recent call last): 2025-12-04T09:19:34.5797675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5798365Z getattr(self, test_name)() 2025-12-04T09:19:34.5799012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5799682Z fn() 2025-12-04T09:19:34.5800255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5800910Z method(*args, **kwargs) 2025-12-04T09:19:34.5801530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5802190Z method(*args, **kwargs) 2025-12-04T09:19:34.5802813Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5803454Z with policy(): 2025-12-04T09:19:34.5804054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5804741Z raise RuntimeError(msg) 2025-12-04T09:19:34.5806023Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 607059968 and is now 619642880. 2025-12-04T09:19:34.5807282Z 2025-12-04T09:19:34.5807479Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5808448Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5809226Z 2025-12-04T09:19:34.5809465Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5809819Z 2025-12-04T09:19:34.5809823Z 2025-12-04T09:19:34.5810037Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.5810599Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.5811717Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-50fd36707db41f77.xml - 2025-12-04T09:19:34.5812765Z =========================== short test summary info ============================ 2025-12-04T09:19:34.5813859Z FAILED [8.2962s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.5814878Z Traceback (most recent call last): 2025-12-04T09:19:34.5815581Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5816372Z getattr(self, test_name)() 2025-12-04T09:19:34.5817270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5818031Z fn() 2025-12-04T09:19:34.5818682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5819449Z method(*args, **kwargs) 2025-12-04T09:19:34.5820167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5821092Z method(*args, **kwargs) 2025-12-04T09:19:34.5821917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5822680Z with policy(): 2025-12-04T09:19:34.5823361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5824139Z raise RuntimeError(msg) 2025-12-04T09:19:34.5825589Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 607059968 and is now 619642880. 2025-12-04T09:19:34.5826951Z 2025-12-04T09:19:34.5827192Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5828269Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5829139Z 2025-12-04T09:19:34.5829415Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5830009Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.5830515Z ======================= 1 failed, 7 deselected in 8.32s ======================== 2025-12-04T09:19:34.5830927Z Got exit code 1 2025-12-04T09:19:34.5831199Z Retrying single test... 2025-12-04T09:19:34.5832081Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-434f2a168fab2502.xml 2025-12-04T09:19:34.5833265Z ============================= test session starts ============================== 2025-12-04T09:19:34.5833949Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.5834512Z cachedir: .pytest_cache 2025-12-04T09:19:34.5835189Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.5835913Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.5836251Z configfile: pytest.ini 2025-12-04T09:19:34.5836940Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.5837774Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.5838848Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5839838Z Running 1 items in this shard 2025-12-04T09:19:34.5840041Z 2025-12-04T09:19:34.5841084Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda I1204 09:14:43.364000 20063 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 20115 2025-12-04T09:19:34.5842709Z I1204 09:14:43.365000 20063 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 20116 2025-12-04T09:19:34.5843770Z I1204 09:14:43.365000 20063 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 20117 2025-12-04T09:19:34.5844836Z I1204 09:14:43.366000 20063 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 20118 2025-12-04T09:19:34.5847064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.5848967Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.5850936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.5852826Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.5854739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.5856587Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.5858768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.5860778Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.5861544Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.5862672Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.5864438Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5866108Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.5867768Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5869373Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.5870705Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5872129Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5873554Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5874969Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5876375Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5877758Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.5879151Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5880634Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.5882707Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 728694784. 2025-12-04T09:19:34.5884646Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5885686Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5887466Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5888969Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5890069Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5891312Z [rank0]:E1204 09:14:50.065000 20115 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.5892335Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.5893400Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.5894908Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5896449Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.5898244Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5899787Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.5901308Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5902920Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5904529Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5906115Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5907718Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5909440Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.5910884Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5912301Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.5914362Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 604962816 and is now 619642880. 2025-12-04T09:19:34.5916301Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5917351Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5919117Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5920613Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5922075Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5923585Z [rank2]:E1204 09:14:50.066000 20117 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.5924738Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.5925888Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.5927567Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5929231Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.5930879Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5932420Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.5933995Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5935417Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5937118Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5938719Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5940328Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5941965Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.5943518Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5945131Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.5947456Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 602865664 and is now 619642880. 2025-12-04T09:19:34.5949627Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5950681Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5952429Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5953928Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5955080Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5956333Z [rank1]:E1204 09:14:50.066000 20116 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.5957347Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.5958356Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.5959857Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5961327Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.5962802Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5964148Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.5965498Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5966922Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5968343Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.5969768Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.5971224Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.5972604Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.5973993Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.5975423Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.5977848Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 489619456 and is now 619642880. 2025-12-04T09:19:34.5980029Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5981205Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.5983196Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.5984960Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.5986196Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.5987588Z [rank3]:E1204 09:14:50.067000 20118 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.5988383Z dist init r=2, world=4 2025-12-04T09:19:34.5988775Z dist init r=0, world=4 2025-12-04T09:19:34.5989141Z dist init r=3, world=4 2025-12-04T09:19:34.5989391Z dist init r=1, world=4 2025-12-04T09:19:34.5989639Z FAILED [8.3116s] [100%] 2025-12-04T09:19:34.5989795Z 2025-12-04T09:19:34.5989943Z =================================== FAILURES =================================== 2025-12-04T09:19:34.5990486Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda __ 2025-12-04T09:19:34.5991006Z Traceback (most recent call last): 2025-12-04T09:19:34.5991718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.5992428Z self._join_processes(fn) 2025-12-04T09:19:34.5993150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.5993936Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.5994731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.5995492Z raise RuntimeError(error) 2025-12-04T09:19:34.5995899Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.5996341Z Traceback (most recent call last): 2025-12-04T09:19:34.5997030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.5997746Z getattr(self, test_name)() 2025-12-04T09:19:34.5998477Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.5999171Z fn() 2025-12-04T09:19:34.5999740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6000419Z method(*args, **kwargs) 2025-12-04T09:19:34.6001059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6001725Z method(*args, **kwargs) 2025-12-04T09:19:34.6002370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6003046Z with policy(): 2025-12-04T09:19:34.6003667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6004342Z raise RuntimeError(msg) 2025-12-04T09:19:34.6005815Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 602865664 and is now 619642880. 2025-12-04T09:19:34.6007116Z 2025-12-04T09:19:34.6007326Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6008536Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.6009376Z 2025-12-04T09:19:34.6009640Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6010103Z 2025-12-04T09:19:34.6010107Z 2025-12-04T09:19:34.6010330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.6010948Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.6012190Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-434f2a168fab2502.xml - 2025-12-04T09:19:34.6013333Z =========================== short test summary info ============================ 2025-12-04T09:19:34.6014509Z FAILED [8.3116s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.6015629Z Traceback (most recent call last): 2025-12-04T09:19:34.6016506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6017483Z getattr(self, test_name)() 2025-12-04T09:19:34.6018249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6019034Z fn() 2025-12-04T09:19:34.6019696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6020454Z method(*args, **kwargs) 2025-12-04T09:19:34.6021365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6022128Z method(*args, **kwargs) 2025-12-04T09:19:34.6022840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6023577Z with policy(): 2025-12-04T09:19:34.6024266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6025040Z raise RuntimeError(msg) 2025-12-04T09:19:34.6026624Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 602865664 and is now 619642880. 2025-12-04T09:19:34.6028002Z 2025-12-04T09:19:34.6028225Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6029307Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.6030167Z 2025-12-04T09:19:34.6030450Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6031045Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.6031542Z ======================= 1 failed, 7 deselected in 8.33s ======================== 2025-12-04T09:19:34.6031970Z Got exit code 1 2025-12-04T09:19:34.6032893Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda 2025-12-04T09:19:34.6033944Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:19:34.6035041Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-810575b51f00acc3.xml 2025-12-04T09:19:34.6035925Z ============================= test session starts ============================== 2025-12-04T09:19:34.6036514Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.6037040Z cachedir: .pytest_cache 2025-12-04T09:19:34.6037667Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.6038433Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.6038738Z configfile: pytest.ini 2025-12-04T09:19:34.6039386Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.6040185Z collecting ... collected 8 items / 1 deselected / 7 selected 2025-12-04T09:19:34.6040628Z stepcurrent: skipping 1 already run items. 2025-12-04T09:19:34.6040966Z Running 7 items in this shard 2025-12-04T09:19:34.6041168Z 2025-12-04T09:19:34.6042144Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda I1204 09:14:56.374000 20376 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 20428 2025-12-04T09:19:34.6043680Z I1204 09:14:56.375000 20376 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 20429 2025-12-04T09:19:34.6044693Z I1204 09:14:56.376000 20376 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 20430 2025-12-04T09:19:34.6045692Z I1204 09:14:56.376000 20376 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 20431 2025-12-04T09:19:34.6047796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6049595Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6051394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6053187Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6055031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6057080Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6059096Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6061108Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6061881Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6063023Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6064703Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6066367Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6068022Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6069657Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6071003Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6072410Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6073834Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6075253Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6076676Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6078057Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6079432Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6080862Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6082990Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 598671360 and is now 619642880. 2025-12-04T09:19:34.6084949Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6085993Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6087745Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6089238Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6090349Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6091609Z [rank1]:E1204 09:15:03.020000 20429 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.6092614Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6093627Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6095128Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6096891Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6098579Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6100107Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6101624Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6103224Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6104829Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6106433Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6108024Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6109616Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6111009Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6112445Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6114584Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 728694784. 2025-12-04T09:19:34.6116517Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6117566Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6119344Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6120980Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6122368Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6123767Z [rank0]:E1204 09:15:03.021000 20428 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.6124923Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6126064Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6127859Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6129513Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6131170Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6132710Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6134256Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6135681Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6137404Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6139010Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6140607Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6142164Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6143726Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6145408Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6147732Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 604962816 and is now 619642880. 2025-12-04T09:19:34.6150007Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6151047Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6152820Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6154307Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6155402Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6156657Z [rank2]:E1204 09:15:03.022000 20430 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.6157676Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6158726Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6160229Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6161703Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6163168Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6164531Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6165873Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6167297Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6168719Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6170138Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6171555Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6172931Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6174367Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6175801Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6178263Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 498008064 and is now 619642880. 2025-12-04T09:19:34.6180464Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6181635Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6183636Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6185328Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6186562Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6187970Z [rank3]:E1204 09:15:03.022000 20431 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.6188891Z dist init r=2, world=4 2025-12-04T09:19:34.6189149Z dist init r=3, world=4 2025-12-04T09:19:34.6189403Z dist init r=0, world=4 2025-12-04T09:19:34.6189640Z dist init r=1, world=4 2025-12-04T09:19:34.6189894Z FAILED [8.2813s] [ 14%] 2025-12-04T09:19:34.6190050Z 2025-12-04T09:19:34.6190199Z =================================== FAILURES =================================== 2025-12-04T09:19:34.6190755Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda __ 2025-12-04T09:19:34.6191264Z Traceback (most recent call last): 2025-12-04T09:19:34.6191975Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.6192693Z self._join_processes(fn) 2025-12-04T09:19:34.6193402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.6194183Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.6194977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.6195755Z raise RuntimeError(error) 2025-12-04T09:19:34.6196152Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.6196592Z Traceback (most recent call last): 2025-12-04T09:19:34.6197293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6197993Z getattr(self, test_name)() 2025-12-04T09:19:34.6198670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6199357Z fn() 2025-12-04T09:19:34.6199938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6200608Z method(*args, **kwargs) 2025-12-04T09:19:34.6201248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6202656Z method(*args, **kwargs) 2025-12-04T09:19:34.6203289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6203969Z with policy(): 2025-12-04T09:19:34.6204588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6205282Z raise RuntimeError(msg) 2025-12-04T09:19:34.6206560Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 598671360 and is now 619642880. 2025-12-04T09:19:34.6207782Z 2025-12-04T09:19:34.6207978Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6208947Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6209715Z 2025-12-04T09:19:34.6209970Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6210324Z 2025-12-04T09:19:34.6210329Z 2025-12-04T09:19:34.6210544Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.6211096Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.6212220Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-810575b51f00acc3.xml - 2025-12-04T09:19:34.6213262Z =========================== short test summary info ============================ 2025-12-04T09:19:34.6214404Z FAILED [8.2813s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.6215415Z Traceback (most recent call last): 2025-12-04T09:19:34.6216135Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6217145Z getattr(self, test_name)() 2025-12-04T09:19:34.6217898Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6218678Z fn() 2025-12-04T09:19:34.6219337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6220106Z method(*args, **kwargs) 2025-12-04T09:19:34.6220981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6221925Z method(*args, **kwargs) 2025-12-04T09:19:34.6222651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6223403Z with policy(): 2025-12-04T09:19:34.6224097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6224874Z raise RuntimeError(msg) 2025-12-04T09:19:34.6226323Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 598671360 and is now 619642880. 2025-12-04T09:19:34.6227692Z 2025-12-04T09:19:34.6227929Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6229014Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6229889Z 2025-12-04T09:19:34.6230277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6230878Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.6231374Z ======================= 1 failed, 1 deselected in 8.30s ======================== 2025-12-04T09:19:34.6231803Z Got exit code 1 2025-12-04T09:19:34.6232082Z Retrying single test... 2025-12-04T09:19:34.6232963Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-acd65444fa26961a.xml 2025-12-04T09:19:34.6233970Z ============================= test session starts ============================== 2025-12-04T09:19:34.6234567Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.6235104Z cachedir: .pytest_cache 2025-12-04T09:19:34.6235726Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.6236418Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.6236736Z configfile: pytest.ini 2025-12-04T09:19:34.6237381Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.6238159Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.6239185Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6240122Z Running 1 items in this shard 2025-12-04T09:19:34.6240309Z 2025-12-04T09:19:34.6241290Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda I1204 09:15:09.394000 20689 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 20741 2025-12-04T09:19:34.6242896Z I1204 09:15:09.395000 20689 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 20742 2025-12-04T09:19:34.6243908Z I1204 09:15:09.395000 20689 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 20743 2025-12-04T09:19:34.6244920Z I1204 09:15:09.396000 20689 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 20744 2025-12-04T09:19:34.6247689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6249748Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6251637Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6253534Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6255429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6257634Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6259745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6261764Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6262518Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6263662Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6265357Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6267043Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6268697Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6270287Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6271855Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6273333Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6274758Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6276161Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6277586Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6279140Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6280611Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6282134Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6284309Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 619642880. 2025-12-04T09:19:34.6286365Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6287467Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6289392Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6291067Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6292153Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6293404Z [rank1]:E1204 09:15:15.972000 20742 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.6294423Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6295434Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6297224Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6298873Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6300527Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6302067Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6303578Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6305268Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6306853Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6308447Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6310016Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6311401Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6312779Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6314201Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6316265Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 728694784. 2025-12-04T09:19:34.6318210Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6319261Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6321397Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6323079Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6324311Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6325723Z [rank0]:E1204 09:15:15.972000 20741 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.6326878Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6328003Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6329687Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6331337Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6332987Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6334598Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6335939Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6337695Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6339305Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6340905Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6342518Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6344068Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6345636Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6347246Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6349608Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 619642880. 2025-12-04T09:19:34.6351628Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6352660Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6354422Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6355924Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6357023Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6358515Z [rank2]:E1204 09:15:15.973000 20743 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.6359586Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6360659Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6362243Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6363805Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6365397Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6366839Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6368263Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6369768Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6371347Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6372760Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6374183Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6375561Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6377235Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6378845Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6381233Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 495910912 and is now 619642880. 2025-12-04T09:19:34.6383423Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6384596Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6386582Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6388269Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6389628Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6390888Z [rank3]:E1204 09:15:15.973000 20744 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.6391606Z dist init r=1, world=4 2025-12-04T09:19:34.6391866Z dist init r=3, world=4 2025-12-04T09:19:34.6392106Z dist init r=0, world=4 2025-12-04T09:19:34.6392359Z dist init r=2, world=4 2025-12-04T09:19:34.6392612Z FAILED [8.2872s] [100%] 2025-12-04T09:19:34.6392764Z 2025-12-04T09:19:34.6392901Z =================================== FAILURES =================================== 2025-12-04T09:19:34.6393451Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda __ 2025-12-04T09:19:34.6394023Z Traceback (most recent call last): 2025-12-04T09:19:34.6394718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.6395439Z self._join_processes(fn) 2025-12-04T09:19:34.6396161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.6396941Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.6397723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.6398495Z raise RuntimeError(error) 2025-12-04T09:19:34.6398902Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:19:34.6399344Z Traceback (most recent call last): 2025-12-04T09:19:34.6400030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6400752Z getattr(self, test_name)() 2025-12-04T09:19:34.6401425Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6402099Z fn() 2025-12-04T09:19:34.6402683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6403360Z method(*args, **kwargs) 2025-12-04T09:19:34.6403998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6404661Z method(*args, **kwargs) 2025-12-04T09:19:34.6405297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6405967Z with policy(): 2025-12-04T09:19:34.6406565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6407255Z raise RuntimeError(msg) 2025-12-04T09:19:34.6408589Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 619642880. 2025-12-04T09:19:34.6409798Z 2025-12-04T09:19:34.6410006Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6410958Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6411741Z 2025-12-04T09:19:34.6411981Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6412349Z 2025-12-04T09:19:34.6412353Z 2025-12-04T09:19:34.6412555Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.6413122Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.6414255Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-acd65444fa26961a.xml - 2025-12-04T09:19:34.6415285Z =========================== short test summary info ============================ 2025-12-04T09:19:34.6416434Z FAILED [8.2872s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:19:34.6417730Z Traceback (most recent call last): 2025-12-04T09:19:34.6418531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6419325Z getattr(self, test_name)() 2025-12-04T09:19:34.6420177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6421148Z fn() 2025-12-04T09:19:34.6421798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6422571Z method(*args, **kwargs) 2025-12-04T09:19:34.6423294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6424056Z method(*args, **kwargs) 2025-12-04T09:19:34.6424761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6425513Z with policy(): 2025-12-04T09:19:34.6426203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6426963Z raise RuntimeError(msg) 2025-12-04T09:19:34.6440805Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 619642880. 2025-12-04T09:19:34.6442128Z 2025-12-04T09:19:34.6442340Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6443289Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6444050Z 2025-12-04T09:19:34.6444284Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6444812Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.6445249Z ======================= 1 failed, 7 deselected in 8.31s ======================== 2025-12-04T09:19:34.6445615Z Got exit code 1 2025-12-04T09:19:34.6445860Z Retrying single test... 2025-12-04T09:19:34.6446636Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d7f6d912312cc834.xml 2025-12-04T09:19:34.6447644Z ============================= test session starts ============================== 2025-12-04T09:19:34.6448238Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.6448771Z cachedir: .pytest_cache 2025-12-04T09:19:34.6449393Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.6450077Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.6450383Z configfile: pytest.ini 2025-12-04T09:19:34.6451024Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.6451796Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.6452826Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6453749Z Running 1 items in this shard 2025-12-04T09:19:34.6453936Z 2025-12-04T09:19:34.6454920Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda I1204 09:15:22.454000 21002 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 21054 2025-12-04T09:19:34.6456537Z I1204 09:15:22.455000 21002 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 21055 2025-12-04T09:19:34.6457834Z I1204 09:15:22.456000 21002 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 21056 2025-12-04T09:19:34.6458962Z I1204 09:15:22.456000 21002 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 21057 2025-12-04T09:19:34.6461467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6463476Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6465481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6467480Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6469530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6471307Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6473082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6474856Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6475525Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6476581Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6478077Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6479545Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6480999Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6482359Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6483702Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6485110Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6486519Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6487921Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6489337Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6490769Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6492147Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6493567Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6495614Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 728694784. 2025-12-04T09:19:34.6497926Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6499092Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6501070Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6502743Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6503955Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6505365Z [rank0]:E1204 09:15:29.098000 21054 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.6506578Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6507706Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6509421Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6510878Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6512336Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6513704Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6515218Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6516709Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6518204Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6519742Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6521565Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6523117Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6524666Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6526259Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6528601Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 604962816 and is now 619642880. 2025-12-04T09:19:34.6530788Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6531958Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6533944Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6535440Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6536585Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6538255Z [rank2]:E1204 09:15:29.099000 21056 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.6539392Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6540507Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6542176Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6543832Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6545480Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6547010Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6548521Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6550296Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6551855Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6553358Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6554855Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6556301Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6557753Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6559451Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6561705Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 598671360 and is now 619642880. 2025-12-04T09:19:34.6563810Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6564939Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6566847Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6568532Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6569722Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6571078Z [rank1]:E1204 09:15:29.099000 21055 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.6572181Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6573259Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6574981Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6576774Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6578418Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6579934Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6581449Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6583096Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6584685Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6586270Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6587854Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6589453Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6590916Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6592416Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6594595Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 581894144 and is now 619642880. 2025-12-04T09:19:34.6596635Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6597778Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6599599Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6601082Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6602162Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6603395Z [rank3]:E1204 09:15:29.100000 21057 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.6604099Z dist init r=1, world=4 2025-12-04T09:19:34.6604352Z dist init r=2, world=4 2025-12-04T09:19:34.6604593Z dist init r=0, world=4 2025-12-04T09:19:34.6604824Z dist init r=3, world=4 2025-12-04T09:19:34.6605061Z FAILED [8.3149s] [100%] 2025-12-04T09:19:34.6605212Z 2025-12-04T09:19:34.6605355Z =================================== FAILURES =================================== 2025-12-04T09:19:34.6605890Z _ TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda __ 2025-12-04T09:19:34.6606399Z Traceback (most recent call last): 2025-12-04T09:19:34.6607099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.6607799Z self._join_processes(fn) 2025-12-04T09:19:34.6608506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.6609281Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.6610063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.6610886Z raise RuntimeError(error) 2025-12-04T09:19:34.6611280Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.6611712Z Traceback (most recent call last): 2025-12-04T09:19:34.6612410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6613106Z getattr(self, test_name)() 2025-12-04T09:19:34.6613771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6614447Z fn() 2025-12-04T09:19:34.6615010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6615669Z method(*args, **kwargs) 2025-12-04T09:19:34.6616563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6617469Z method(*args, **kwargs) 2025-12-04T09:19:34.6618165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6618914Z with policy(): 2025-12-04T09:19:34.6619595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6620347Z raise RuntimeError(msg) 2025-12-04T09:19:34.6621959Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 581894144 and is now 619642880. 2025-12-04T09:19:34.6623326Z 2025-12-04T09:19:34.6623544Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6624623Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6625478Z 2025-12-04T09:19:34.6625752Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6626149Z 2025-12-04T09:19:34.6626265Z 2025-12-04T09:19:34.6626492Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.6627115Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.6628387Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d7f6d912312cc834.xml - 2025-12-04T09:19:34.6629562Z =========================== short test summary info ============================ 2025-12-04T09:19:34.6630764Z FAILED [8.3149s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.6631904Z Traceback (most recent call last): 2025-12-04T09:19:34.6632779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6633494Z getattr(self, test_name)() 2025-12-04T09:19:34.6634151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6634845Z fn() 2025-12-04T09:19:34.6635422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6636086Z method(*args, **kwargs) 2025-12-04T09:19:34.6636716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6637385Z method(*args, **kwargs) 2025-12-04T09:19:34.6638080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6638728Z with policy(): 2025-12-04T09:19:34.6639340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6640000Z raise RuntimeError(msg) 2025-12-04T09:19:34.6641272Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 581894144 and is now 619642880. 2025-12-04T09:19:34.6642482Z 2025-12-04T09:19:34.6642672Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6643630Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6644393Z 2025-12-04T09:19:34.6644636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6645145Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.6645585Z ======================= 1 failed, 7 deselected in 8.34s ======================== 2025-12-04T09:19:34.6645953Z Got exit code 1 2025-12-04T09:19:34.6646663Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda 2025-12-04T09:19:34.6647715Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:19:34.6648811Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d3fa58c4cf34965f.xml 2025-12-04T09:19:34.6649673Z ============================= test session starts ============================== 2025-12-04T09:19:34.6650246Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.6650763Z cachedir: .pytest_cache 2025-12-04T09:19:34.6651375Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.6652106Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.6652404Z configfile: pytest.ini 2025-12-04T09:19:34.6653044Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.6653818Z collecting ... collected 8 items / 2 deselected / 6 selected 2025-12-04T09:19:34.6654229Z stepcurrent: skipping 2 already run items. 2025-12-04T09:19:34.6654567Z Running 6 items in this shard 2025-12-04T09:19:34.6654748Z 2025-12-04T09:19:34.6655840Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda I1204 09:15:35.514000 21315 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 21367 2025-12-04T09:19:34.6657832Z I1204 09:15:35.515000 21315 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 21368 2025-12-04T09:19:34.6658958Z I1204 09:15:35.516000 21315 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 21369 2025-12-04T09:19:34.6660083Z I1204 09:15:35.516000 21315 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 21370 2025-12-04T09:19:34.6662444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6664517Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6666532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6668531Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6670517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6672508Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6674281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6676048Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6676717Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6677705Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6679194Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6680656Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6682164Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6683512Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6684838Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6686244Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6687655Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6688090Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6688953Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6689347Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6690208Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6690696Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6692307Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.6692639Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6693224Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6694395Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6694716Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6695362Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6695844Z [rank0]:E1204 09:15:42.193000 21367 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.6696306Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6696983Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6698046Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6698567Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6699558Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6699973Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6700942Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6701435Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6702402Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6702888Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6703852Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6704369Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6705346Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6705840Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6707647Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:19:34.6708029Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6708790Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6710079Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6710401Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6711047Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6711533Z [rank2]:E1204 09:15:42.198000 21369 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.6711939Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6712485Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6713379Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6713838Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6714716Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6715079Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6715932Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6716361Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6717227Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6717657Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6718571Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6718966Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6719831Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6720267Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6722274Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:19:34.6722650Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6723310Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6724620Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6724982Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6725716Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6726352Z [rank1]:E1204 09:15:42.199000 21368 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.6726813Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6727343Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6728341Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6728857Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6729850Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6730251Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6731209Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6731696Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6732665Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6733327Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6734314Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6734712Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6735576Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6736010Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6737997Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 586088448 and is now 630128640. 2025-12-04T09:19:34.6738363Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6739021Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6740323Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6740686Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6741465Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6742008Z [rank3]:E1204 09:15:42.199000 21370 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.6742113Z dist init r=0, world=4 2025-12-04T09:19:34.6742210Z dist init r=1, world=4 2025-12-04T09:19:34.6742305Z dist init r=2, world=4 2025-12-04T09:19:34.6742407Z dist init r=3, world=4 2025-12-04T09:19:34.6743571Z [rank0]:[W1204 09:15:42.210480373 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:19:34.6743671Z FAILED [8.3129s] [ 16%] 2025-12-04T09:19:34.6743677Z 2025-12-04T09:19:34.6743834Z =================================== FAILURES =================================== 2025-12-04T09:19:34.6744274Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda _ 2025-12-04T09:19:34.6744401Z Traceback (most recent call last): 2025-12-04T09:19:34.6744945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.6745055Z self._join_processes(fn) 2025-12-04T09:19:34.6745652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.6745796Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.6746486Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.6746615Z raise RuntimeError(error) 2025-12-04T09:19:34.6746859Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:19:34.6746993Z Traceback (most recent call last): 2025-12-04T09:19:34.6747541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6747656Z getattr(self, test_name)() 2025-12-04T09:19:34.6748207Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6748299Z fn() 2025-12-04T09:19:34.6748814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6749039Z method(*args, **kwargs) 2025-12-04T09:19:34.6749498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6749609Z method(*args, **kwargs) 2025-12-04T09:19:34.6750063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6750155Z with policy(): 2025-12-04T09:19:34.6750626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6750728Z raise RuntimeError(msg) 2025-12-04T09:19:34.6751946Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:19:34.6751951Z 2025-12-04T09:19:34.6752153Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6752920Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6752938Z 2025-12-04T09:19:34.6753242Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6753247Z 2025-12-04T09:19:34.6753251Z 2025-12-04T09:19:34.6753452Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.6753703Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.6754474Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d3fa58c4cf34965f.xml - 2025-12-04T09:19:34.6754647Z =========================== short test summary info ============================ 2025-12-04T09:19:34.6755562Z FAILED [8.3129s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:19:34.6755676Z Traceback (most recent call last): 2025-12-04T09:19:34.6756187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6756293Z getattr(self, test_name)() 2025-12-04T09:19:34.6756788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6756875Z fn() 2025-12-04T09:19:34.6757331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6757447Z method(*args, **kwargs) 2025-12-04T09:19:34.6757900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6758048Z method(*args, **kwargs) 2025-12-04T09:19:34.6758516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6758613Z with policy(): 2025-12-04T09:19:34.6759086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6759188Z raise RuntimeError(msg) 2025-12-04T09:19:34.6760392Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:19:34.6760397Z 2025-12-04T09:19:34.6760607Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6761373Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6761378Z 2025-12-04T09:19:34.6761636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6761799Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.6761964Z ======================= 1 failed, 2 deselected in 8.33s ======================== 2025-12-04T09:19:34.6762066Z Got exit code 1 2025-12-04T09:19:34.6762166Z Retrying single test... 2025-12-04T09:19:34.6762793Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d5b8ecd9108f02ac.xml 2025-12-04T09:19:34.6762944Z ============================= test session starts ============================== 2025-12-04T09:19:34.6763258Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.6763377Z cachedir: .pytest_cache 2025-12-04T09:19:34.6763840Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.6764003Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.6764113Z configfile: pytest.ini 2025-12-04T09:19:34.6764594Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.6764794Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.6765631Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6765736Z Running 1 items in this shard 2025-12-04T09:19:34.6765741Z 2025-12-04T09:19:34.6768811Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda I1204 09:15:48.914000 21652 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 21704 2025-12-04T09:19:34.6769279Z I1204 09:15:48.915000 21652 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 21705 2025-12-04T09:19:34.6769731Z I1204 09:15:48.915000 21652 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 21706 2025-12-04T09:19:34.6770168Z I1204 09:15:48.916000 21652 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 21707 2025-12-04T09:19:34.6771709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6771938Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6773466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6773630Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6775149Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6775318Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6777136Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6777318Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6777783Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6778344Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6779358Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6779942Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6780952Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6781355Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6782337Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6782838Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6783820Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6784314Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6785276Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6785741Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6786766Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6787279Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6789161Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.6789501Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6790097Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6791268Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6791607Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6792247Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6792748Z [rank0]:E1204 09:15:55.613000 21704 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.6793158Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6793642Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6794628Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6795088Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6795983Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6796341Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6797217Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6797657Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6798528Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6798973Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6799831Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6800310Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6801174Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6801627Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6803231Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:19:34.6803579Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6804174Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6805348Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6805692Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6806332Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6806841Z [rank1]:E1204 09:15:55.620000 21705 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.6807297Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6807788Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6808678Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6809135Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6810026Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6810391Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6811260Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6811695Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6812560Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6813514Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6814375Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6814790Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6815650Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6816108Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6818101Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:19:34.6818494Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6819159Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6820472Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6821045Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6821894Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6822459Z [rank2]:E1204 09:15:55.621000 21706 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.6822914Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6823463Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6824469Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6824985Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6825995Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6826400Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6827378Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6827871Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6828922Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6829420Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6830387Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6830846Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6831814Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6832325Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6834158Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 586088448 and is now 630128640. 2025-12-04T09:19:34.6834520Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6835145Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6836388Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6836800Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6837480Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6838009Z [rank3]:E1204 09:15:55.621000 21707 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.6838110Z dist init r=1, world=4 2025-12-04T09:19:34.6838384Z dist init r=2, world=4 2025-12-04T09:19:34.6838481Z dist init r=3, world=4 2025-12-04T09:19:34.6838579Z dist init r=0, world=4 2025-12-04T09:19:34.6839724Z [rank0]:[W1204 09:15:56.667134069 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:19:34.6839830Z FAILED [8.8295s] [100%] 2025-12-04T09:19:34.6839836Z 2025-12-04T09:19:34.6839983Z =================================== FAILURES =================================== 2025-12-04T09:19:34.6840426Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda _ 2025-12-04T09:19:34.6840548Z Traceback (most recent call last): 2025-12-04T09:19:34.6841097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.6841213Z self._join_processes(fn) 2025-12-04T09:19:34.6841876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.6842081Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.6842659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.6842791Z raise RuntimeError(error) 2025-12-04T09:19:34.6843017Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:19:34.6843136Z Traceback (most recent call last): 2025-12-04T09:19:34.6843664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6843773Z getattr(self, test_name)() 2025-12-04T09:19:34.6844284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6844385Z fn() 2025-12-04T09:19:34.6844868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6844986Z method(*args, **kwargs) 2025-12-04T09:19:34.6845465Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6845571Z method(*args, **kwargs) 2025-12-04T09:19:34.6846061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6846156Z with policy(): 2025-12-04T09:19:34.6846637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6846755Z raise RuntimeError(msg) 2025-12-04T09:19:34.6848022Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:19:34.6848032Z 2025-12-04T09:19:34.6848251Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6849114Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6849120Z 2025-12-04T09:19:34.6849388Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6849393Z 2025-12-04T09:19:34.6849550Z Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.6849666Z Traceback (most recent call last): 2025-12-04T09:19:34.6850299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6850402Z getattr(self, test_name)() 2025-12-04T09:19:34.6850887Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6850977Z fn() 2025-12-04T09:19:34.6851425Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6851527Z method(*args, **kwargs) 2025-12-04T09:19:34.6851980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6852076Z method(*args, **kwargs) 2025-12-04T09:19:34.6852536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6852623Z with policy(): 2025-12-04T09:19:34.6853073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6853183Z raise RuntimeError(msg) 2025-12-04T09:19:34.6854376Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 586088448 and is now 630128640. 2025-12-04T09:19:34.6854450Z 2025-12-04T09:19:34.6854650Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6855414Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6855419Z 2025-12-04T09:19:34.6855666Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6855671Z 2025-12-04T09:19:34.6855675Z 2025-12-04T09:19:34.6855875Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.6856109Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.6857169Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d5b8ecd9108f02ac.xml - 2025-12-04T09:19:34.6857343Z =========================== short test summary info ============================ 2025-12-04T09:19:34.6858374Z FAILED [8.8295s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:19:34.6858495Z Traceback (most recent call last): 2025-12-04T09:19:34.6859045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6859163Z getattr(self, test_name)() 2025-12-04T09:19:34.6859702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6859805Z fn() 2025-12-04T09:19:34.6860311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6860415Z method(*args, **kwargs) 2025-12-04T09:19:34.6860987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6861091Z method(*args, **kwargs) 2025-12-04T09:19:34.6861601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6861696Z with policy(): 2025-12-04T09:19:34.6862204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6862321Z raise RuntimeError(msg) 2025-12-04T09:19:34.6863670Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:19:34.6863679Z 2025-12-04T09:19:34.6863900Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6864753Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6864758Z 2025-12-04T09:19:34.6865022Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6865027Z 2025-12-04T09:19:34.6865197Z Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.6865319Z Traceback (most recent call last): 2025-12-04T09:19:34.6865876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6866044Z getattr(self, test_name)() 2025-12-04T09:19:34.6866587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6866685Z fn() 2025-12-04T09:19:34.6867197Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6867304Z method(*args, **kwargs) 2025-12-04T09:19:34.6867819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6867922Z method(*args, **kwargs) 2025-12-04T09:19:34.6868439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6868533Z with policy(): 2025-12-04T09:19:34.6869141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6869258Z raise RuntimeError(msg) 2025-12-04T09:19:34.6870530Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 586088448 and is now 630128640. 2025-12-04T09:19:34.6870536Z 2025-12-04T09:19:34.6870743Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6871616Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6871621Z 2025-12-04T09:19:34.6871855Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6872021Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.6872179Z ======================= 1 failed, 7 deselected in 8.85s ======================== 2025-12-04T09:19:34.6872277Z Got exit code 1 2025-12-04T09:19:34.6872370Z Retrying single test... 2025-12-04T09:19:34.6873026Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-578e4c4077b7a803.xml 2025-12-04T09:19:34.6873179Z ============================= test session starts ============================== 2025-12-04T09:19:34.6873489Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.6873585Z cachedir: .pytest_cache 2025-12-04T09:19:34.6874049Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.6874154Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.6874255Z configfile: pytest.ini 2025-12-04T09:19:34.6874728Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.6874914Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.6875761Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6875860Z Running 1 items in this shard 2025-12-04T09:19:34.6875864Z 2025-12-04T09:19:34.6876945Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda I1204 09:16:02.204000 21989 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 22041 2025-12-04T09:19:34.6877390Z I1204 09:16:02.204000 21989 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 22042 2025-12-04T09:19:34.6877830Z I1204 09:16:02.205000 21989 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 22043 2025-12-04T09:19:34.6878321Z I1204 09:16:02.206000 21989 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 22044 2025-12-04T09:19:34.6879860Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6880016Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6881532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6881687Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6883200Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6883353Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6884863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6885020Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6885477Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6885952Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6886850Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6887303Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6888195Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6888553Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6889408Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6889851Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6890702Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6891199Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6892054Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6892463Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6893321Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6893756Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6895366Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.6895695Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6896353Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6897805Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6898182Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6898908Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6899550Z [rank0]:E1204 09:16:08.842000 22041 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.6900007Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6900537Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6901553Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6902070Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6903075Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6903475Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6904435Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6904935Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6905954Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6906460Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6907418Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6907881Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6908941Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6909489Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6911107Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 583991296 and is now 630128640. 2025-12-04T09:19:34.6911432Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6912025Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6913191Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6913573Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6914212Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6914703Z [rank3]:E1204 09:16:08.848000 22044 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.6915111Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6915583Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6916481Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6916936Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6917820Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6918176Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6919032Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6919524Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6920380Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6920955Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6922066Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6922523Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6923498Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6923992Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6925820Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:19:34.6926185Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6926855Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6928285Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6928659Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6929376Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6929933Z [rank2]:E1204 09:16:08.849000 22043 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.6930388Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6930924Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6931931Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6932440Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6933440Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6934101Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6934963Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6935395Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6936304Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6936934Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6937892Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6938348Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6939315Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6939803Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6941610Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 598671360 and is now 630128640. 2025-12-04T09:19:34.6941976Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6942710Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6944018Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6944385Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6945104Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6945661Z [rank1]:E1204 09:16:08.850000 22042 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.6945768Z dist init r=3, world=4 2025-12-04T09:19:34.6945870Z dist init r=2, world=4 2025-12-04T09:19:34.6945971Z dist init r=1, world=4 2025-12-04T09:19:34.6946069Z dist init r=0, world=4 2025-12-04T09:19:34.6947229Z [rank0]:[W1204 09:16:09.901193775 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:19:34.6947338Z FAILED [8.5234s] [100%] 2025-12-04T09:19:34.6947345Z 2025-12-04T09:19:34.6947494Z =================================== FAILURES =================================== 2025-12-04T09:19:34.6948003Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda _ 2025-12-04T09:19:34.6948124Z Traceback (most recent call last): 2025-12-04T09:19:34.6948678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.6948797Z self._join_processes(fn) 2025-12-04T09:19:34.6949431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.6949563Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.6950101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.6950202Z raise RuntimeError(error) 2025-12-04T09:19:34.6950418Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.6950524Z Traceback (most recent call last): 2025-12-04T09:19:34.6951009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6951114Z getattr(self, test_name)() 2025-12-04T09:19:34.6951591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6951684Z fn() 2025-12-04T09:19:34.6952134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6952227Z method(*args, **kwargs) 2025-12-04T09:19:34.6952685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6952775Z method(*args, **kwargs) 2025-12-04T09:19:34.6953221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6953318Z with policy(): 2025-12-04T09:19:34.6953769Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6953877Z raise RuntimeError(msg) 2025-12-04T09:19:34.6955144Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 583991296 and is now 630128640. 2025-12-04T09:19:34.6955151Z 2025-12-04T09:19:34.6955345Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6956110Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6956114Z 2025-12-04T09:19:34.6956349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6956358Z 2025-12-04T09:19:34.6956362Z 2025-12-04T09:19:34.6956565Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.6956797Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.6957565Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-578e4c4077b7a803.xml - 2025-12-04T09:19:34.6957720Z =========================== short test summary info ============================ 2025-12-04T09:19:34.6958616Z FAILED [8.5234s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.6958734Z Traceback (most recent call last): 2025-12-04T09:19:34.6959219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6959378Z getattr(self, test_name)() 2025-12-04T09:19:34.6959856Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6959939Z fn() 2025-12-04T09:19:34.6960399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6960491Z method(*args, **kwargs) 2025-12-04T09:19:34.6960939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6961045Z method(*args, **kwargs) 2025-12-04T09:19:34.6961490Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6961584Z with policy(): 2025-12-04T09:19:34.6962037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6962135Z raise RuntimeError(msg) 2025-12-04T09:19:34.6963339Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 583991296 and is now 630128640. 2025-12-04T09:19:34.6963345Z 2025-12-04T09:19:34.6963538Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6964304Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6964309Z 2025-12-04T09:19:34.6964547Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6964710Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.6964876Z ======================= 1 failed, 7 deselected in 8.55s ======================== 2025-12-04T09:19:34.6964963Z Got exit code 1 2025-12-04T09:19:34.6965702Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda 2025-12-04T09:19:34.6966067Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:19:34.6966674Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14d4a314808f55fe.xml 2025-12-04T09:19:34.6966829Z ============================= test session starts ============================== 2025-12-04T09:19:34.6967141Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.6967249Z cachedir: .pytest_cache 2025-12-04T09:19:34.6967703Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.6967814Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.6967920Z configfile: pytest.ini 2025-12-04T09:19:34.6968402Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.6968587Z collecting ... collected 8 items / 3 deselected / 5 selected 2025-12-04T09:19:34.6968718Z stepcurrent: skipping 3 already run items. 2025-12-04T09:19:34.6968819Z Running 5 items in this shard 2025-12-04T09:19:34.6968824Z 2025-12-04T09:19:34.6969916Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda I1204 09:16:15.504000 22326 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 22378 2025-12-04T09:19:34.6970411Z I1204 09:16:15.505000 22326 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 22379 2025-12-04T09:19:34.6970866Z I1204 09:16:15.506000 22326 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 22380 2025-12-04T09:19:34.6971299Z I1204 09:16:15.507000 22326 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 22381 2025-12-04T09:19:34.6972825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6972982Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6974507Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6974670Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6976177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6976413Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6978323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.6978495Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.6978955Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6979488Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6980493Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6981003Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6982004Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6982402Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6983363Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6983860Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6984821Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6985379Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6986346Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.6986803Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.6987770Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.6988263Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.6990044Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.6990375Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6990965Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.6992125Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.6992509Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.6993149Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.6993642Z [rank0]:E1204 09:16:22.331000 22378 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.6994042Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.6994514Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.6995414Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.6995868Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.6996756Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.6997107Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.6997977Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6998518Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.6999378Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.6999997Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7000900Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7001333Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7002245Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7002712Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7004421Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 491716608 and is now 630128640. 2025-12-04T09:19:34.7004769Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7005405Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7006683Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7007042Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7007717Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7008405Z [rank3]:E1204 09:16:22.333000 22381 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.7008851Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7009367Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7010357Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7010852Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7011816Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7012203Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7013201Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7013678Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7014607Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7015095Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7016021Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7016545Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7017675Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7018180Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7019989Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:19:34.7020361Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7021291Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7022604Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7022979Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7023695Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7024255Z [rank1]:E1204 09:16:22.333000 22379 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.7024711Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7025245Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7026253Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7026763Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7027829Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7028231Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7029202Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7029689Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7030654Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7031158Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7032117Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7032683Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7033703Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7034180Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7035922Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:19:34.7036269Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7036892Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7038117Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7038568Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7039210Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7039706Z [rank2]:E1204 09:16:22.335000 22380 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.7039800Z dist init r=1, world=4 2025-12-04T09:19:34.7039886Z dist init r=2, world=4 2025-12-04T09:19:34.7039979Z dist init r=0, world=4 2025-12-04T09:19:34.7040066Z dist init r=3, world=4 2025-12-04T09:19:34.7041092Z [rank0]:[W1204 09:16:22.343252882 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:19:34.7041242Z FAILED [8.4208s] [ 20%] 2025-12-04T09:19:34.7041247Z 2025-12-04T09:19:34.7041378Z =================================== FAILURES =================================== 2025-12-04T09:19:34.7041780Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda _ 2025-12-04T09:19:34.7041889Z Traceback (most recent call last): 2025-12-04T09:19:34.7042374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.7042484Z self._join_processes(fn) 2025-12-04T09:19:34.7043004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.7043139Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.7043676Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.7043778Z raise RuntimeError(error) 2025-12-04T09:19:34.7043996Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:19:34.7044100Z Traceback (most recent call last): 2025-12-04T09:19:34.7044583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7044696Z getattr(self, test_name)() 2025-12-04T09:19:34.7045172Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7045259Z fn() 2025-12-04T09:19:34.7045709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7045802Z method(*args, **kwargs) 2025-12-04T09:19:34.7046262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7046357Z method(*args, **kwargs) 2025-12-04T09:19:34.7046805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7046904Z with policy(): 2025-12-04T09:19:34.7047404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7047507Z raise RuntimeError(msg) 2025-12-04T09:19:34.7048707Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.7048712Z 2025-12-04T09:19:34.7048920Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7049683Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7049692Z 2025-12-04T09:19:34.7049928Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7049933Z 2025-12-04T09:19:34.7049941Z 2025-12-04T09:19:34.7050151Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.7050386Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.7051155Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14d4a314808f55fe.xml - 2025-12-04T09:19:34.7051309Z =========================== short test summary info ============================ 2025-12-04T09:19:34.7052205Z FAILED [8.4208s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:19:34.7052388Z Traceback (most recent call last): 2025-12-04T09:19:34.7052880Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7052993Z getattr(self, test_name)() 2025-12-04T09:19:34.7053475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7053555Z fn() 2025-12-04T09:19:34.7054025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7054117Z method(*args, **kwargs) 2025-12-04T09:19:34.7054575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7054671Z method(*args, **kwargs) 2025-12-04T09:19:34.7055122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7055224Z with policy(): 2025-12-04T09:19:34.7055681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7055777Z raise RuntimeError(msg) 2025-12-04T09:19:34.7057275Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.7057282Z 2025-12-04T09:19:34.7057497Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7058358Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7058370Z 2025-12-04T09:19:34.7058636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7058827Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.7059071Z ======================= 1 failed, 3 deselected in 8.44s ======================== 2025-12-04T09:19:34.7059171Z Got exit code 1 2025-12-04T09:19:34.7059291Z Retrying single test... 2025-12-04T09:19:34.7059976Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-72b90a4f7545df10.xml 2025-12-04T09:19:34.7060139Z ============================= test session starts ============================== 2025-12-04T09:19:34.7060501Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.7060607Z cachedir: .pytest_cache 2025-12-04T09:19:34.7061139Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.7061262Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.7061370Z configfile: pytest.ini 2025-12-04T09:19:34.7061922Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.7062124Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.7063058Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7063186Z Running 1 items in this shard 2025-12-04T09:19:34.7063191Z 2025-12-04T09:19:34.7064409Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda I1204 09:16:28.824000 22663 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 22715 2025-12-04T09:19:34.7064980Z I1204 09:16:28.825000 22663 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 22716 2025-12-04T09:19:34.7065476Z I1204 09:16:28.826000 22663 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 22717 2025-12-04T09:19:34.7065980Z I1204 09:16:28.826000 22663 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 22718 2025-12-04T09:19:34.7067711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7067883Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7069735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7069881Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7071415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7071566Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7073332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7073488Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7073928Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7074434Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7075385Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7075878Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7076819Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7077198Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7078112Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7078579Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7079533Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7079995Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7080898Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7081319Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7082238Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7082708Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7084419Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:19:34.7084765Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7085387Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7086677Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7087114Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7087763Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7088245Z [rank1]:E1204 09:16:35.577000 22716 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.7088659Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7089134Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7090032Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7090496Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7091374Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7091737Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7092585Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7093081Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7093928Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7094360Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7095218Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7095616Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7096544Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7097194Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7099004Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 491716608 and is now 630128640. 2025-12-04T09:19:34.7099374Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7100040Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7101443Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7101808Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7102537Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7103085Z [rank3]:E1204 09:16:35.577000 22718 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.7103551Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7104091Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7105097Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7105619Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7106608Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7107071Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7108037Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7108649Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7109641Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7110072Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7110933Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7111334Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7112199Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7112638Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7114247Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.7114627Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7115223Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7116386Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7116710Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7117360Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7117847Z [rank0]:E1204 09:16:35.577000 22715 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.7118258Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7118729Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7119619Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7120079Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7121539Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7121963Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7122928Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7123431Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7124393Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7124888Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7125870Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7126315Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7127299Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7127792Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7129705Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:19:34.7130074Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7130745Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7132063Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7132428Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7133270Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7133905Z [rank2]:E1204 09:16:35.577000 22717 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.7134011Z dist init r=3, world=4 2025-12-04T09:19:34.7134107Z dist init r=0, world=4 2025-12-04T09:19:34.7134200Z dist init r=2, world=4 2025-12-04T09:19:34.7134301Z dist init r=1, world=4 2025-12-04T09:19:34.7135400Z [rank0]:[W1204 09:16:35.589301494 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:19:34.7135567Z FAILED [8.4881s] [100%] 2025-12-04T09:19:34.7135573Z 2025-12-04T09:19:34.7135718Z =================================== FAILURES =================================== 2025-12-04T09:19:34.7136133Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda _ 2025-12-04T09:19:34.7136326Z Traceback (most recent call last): 2025-12-04T09:19:34.7137027Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.7137142Z self._join_processes(fn) 2025-12-04T09:19:34.7137745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.7137887Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.7138507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.7138621Z raise RuntimeError(error) 2025-12-04T09:19:34.7138858Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:19:34.7138997Z Traceback (most recent call last): 2025-12-04T09:19:34.7139544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7139659Z getattr(self, test_name)() 2025-12-04T09:19:34.7140205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7140292Z fn() 2025-12-04T09:19:34.7140812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7140918Z method(*args, **kwargs) 2025-12-04T09:19:34.7141430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7141544Z method(*args, **kwargs) 2025-12-04T09:19:34.7142113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7142222Z with policy(): 2025-12-04T09:19:34.7142731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7142840Z raise RuntimeError(msg) 2025-12-04T09:19:34.7144200Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:19:34.7144207Z 2025-12-04T09:19:34.7144421Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7145287Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7145293Z 2025-12-04T09:19:34.7145563Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7145568Z 2025-12-04T09:19:34.7145573Z 2025-12-04T09:19:34.7145792Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.7146065Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.7146928Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-72b90a4f7545df10.xml - 2025-12-04T09:19:34.7147108Z =========================== short test summary info ============================ 2025-12-04T09:19:34.7148173Z FAILED [8.4881s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:19:34.7148292Z Traceback (most recent call last): 2025-12-04T09:19:34.7148954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7149056Z getattr(self, test_name)() 2025-12-04T09:19:34.7149545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7149627Z fn() 2025-12-04T09:19:34.7150078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7150180Z method(*args, **kwargs) 2025-12-04T09:19:34.7150630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7150737Z method(*args, **kwargs) 2025-12-04T09:19:34.7151186Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7151278Z with policy(): 2025-12-04T09:19:34.7151749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7151850Z raise RuntimeError(msg) 2025-12-04T09:19:34.7153046Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:19:34.7153062Z 2025-12-04T09:19:34.7153254Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7154020Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7154025Z 2025-12-04T09:19:34.7154274Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7154500Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.7154673Z ======================= 1 failed, 7 deselected in 8.51s ======================== 2025-12-04T09:19:34.7154761Z Got exit code 1 2025-12-04T09:19:34.7154859Z Retrying single test... 2025-12-04T09:19:34.7155478Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cc094df1219cfd82.xml 2025-12-04T09:19:34.7155625Z ============================= test session starts ============================== 2025-12-04T09:19:34.7155938Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.7156051Z cachedir: .pytest_cache 2025-12-04T09:19:34.7156512Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.7156630Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.7156729Z configfile: pytest.ini 2025-12-04T09:19:34.7157206Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.7157400Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.7158230Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7158347Z Running 1 items in this shard 2025-12-04T09:19:34.7158351Z 2025-12-04T09:19:34.7159422Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda I1204 09:16:42.174000 23000 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 23052 2025-12-04T09:19:34.7159930Z I1204 09:16:42.175000 23000 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 23053 2025-12-04T09:19:34.7160379Z I1204 09:16:42.176000 23000 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 23054 2025-12-04T09:19:34.7160814Z I1204 09:16:42.176000 23000 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 23055 2025-12-04T09:19:34.7162354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7162508Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7164056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7164204Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7165721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7165879Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7167437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7167591Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7168003Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7168492Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7169380Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7169842Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7170721Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7171073Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7171938Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7172426Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7173292Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7173725Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7174580Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7174986Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7175844Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7176360Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7178298Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:19:34.7178675Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7179338Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7180731Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7181098Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7181814Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7182371Z [rank1]:E1204 09:16:48.851000 23053 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.7182824Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7183367Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7184371Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7184888Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7185878Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7186274Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7187299Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7187796Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7188873Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7189418Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7190279Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7190680Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7191538Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7191984Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7193572Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 586088448 and is now 630128640. 2025-12-04T09:19:34.7193916Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7194548Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7195730Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7196055Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7196688Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7197187Z [rank3]:E1204 09:16:48.852000 23055 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.7197593Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7198075Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7198966Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7199426Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7200308Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7200727Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7201588Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7202021Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7202883Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7203322Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7204182Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7204578Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7205433Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7205880Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7207517Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.7207860Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7208449Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7209619Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7209948Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7210581Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7211079Z [rank0]:E1204 09:16:48.852000 23052 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.7211477Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7211962Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7212852Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7213416Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7214305Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7214659Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7215522Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7215955Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7217102Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7217613Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7218586Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7219031Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7219997Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7220510Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7222601Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:19:34.7222984Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7223651Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7224970Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7225340Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7226056Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7226612Z [rank2]:E1204 09:16:48.852000 23054 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.7226716Z dist init r=0, world=4 2025-12-04T09:19:34.7226829Z dist init r=3, world=4 2025-12-04T09:19:34.7226929Z dist init r=2, world=4 2025-12-04T09:19:34.7227024Z dist init r=1, world=4 2025-12-04T09:19:34.7228264Z [rank0]:[W1204 09:16:49.864692706 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:19:34.7228371Z FAILED [9.1002s] [100%] 2025-12-04T09:19:34.7228377Z 2025-12-04T09:19:34.7228536Z =================================== FAILURES =================================== 2025-12-04T09:19:34.7228983Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda _ 2025-12-04T09:19:34.7229099Z Traceback (most recent call last): 2025-12-04T09:19:34.7229663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.7229775Z self._join_processes(fn) 2025-12-04T09:19:34.7230370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.7230529Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.7231136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.7231266Z raise RuntimeError(error) 2025-12-04T09:19:34.7231509Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:19:34.7231627Z Traceback (most recent call last): 2025-12-04T09:19:34.7232187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7232297Z getattr(self, test_name)() 2025-12-04T09:19:34.7232931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7233015Z fn() 2025-12-04T09:19:34.7233494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7233608Z method(*args, **kwargs) 2025-12-04T09:19:34.7234079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7234178Z method(*args, **kwargs) 2025-12-04T09:19:34.7234717Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7234809Z with policy(): 2025-12-04T09:19:34.7235300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7235504Z raise RuntimeError(msg) 2025-12-04T09:19:34.7236700Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.7236709Z 2025-12-04T09:19:34.7236913Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7237673Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7237678Z 2025-12-04T09:19:34.7237924Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7237928Z 2025-12-04T09:19:34.7238070Z Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.7238179Z Traceback (most recent call last): 2025-12-04T09:19:34.7238681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7238782Z getattr(self, test_name)() 2025-12-04T09:19:34.7239272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7239401Z fn() 2025-12-04T09:19:34.7239850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7239954Z method(*args, **kwargs) 2025-12-04T09:19:34.7240404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7240495Z method(*args, **kwargs) 2025-12-04T09:19:34.7240950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7241035Z with policy(): 2025-12-04T09:19:34.7241498Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7241592Z raise RuntimeError(msg) 2025-12-04T09:19:34.7242786Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:19:34.7242805Z 2025-12-04T09:19:34.7242999Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7243754Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7243759Z 2025-12-04T09:19:34.7244002Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7244007Z 2025-12-04T09:19:34.7244149Z Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.7244266Z Traceback (most recent call last): 2025-12-04T09:19:34.7244752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7244852Z getattr(self, test_name)() 2025-12-04T09:19:34.7245339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7245417Z fn() 2025-12-04T09:19:34.7245909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7246018Z method(*args, **kwargs) 2025-12-04T09:19:34.7246469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7246570Z method(*args, **kwargs) 2025-12-04T09:19:34.7247021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7247109Z with policy(): 2025-12-04T09:19:34.7247571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7247674Z raise RuntimeError(msg) 2025-12-04T09:19:34.7248874Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 586088448 and is now 630128640. 2025-12-04T09:19:34.7248879Z 2025-12-04T09:19:34.7249073Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7249829Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7249833Z 2025-12-04T09:19:34.7250082Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7250086Z 2025-12-04T09:19:34.7250090Z 2025-12-04T09:19:34.7250354Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.7250600Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.7251375Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cc094df1219cfd82.xml - 2025-12-04T09:19:34.7251529Z =========================== short test summary info ============================ 2025-12-04T09:19:34.7252443Z FAILED [9.1002s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:19:34.7252555Z Traceback (most recent call last): 2025-12-04T09:19:34.7253061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7253164Z getattr(self, test_name)() 2025-12-04T09:19:34.7253644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7253734Z fn() 2025-12-04T09:19:34.7254187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7254293Z method(*args, **kwargs) 2025-12-04T09:19:34.7254746Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7254839Z method(*args, **kwargs) 2025-12-04T09:19:34.7255294Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7255385Z with policy(): 2025-12-04T09:19:34.7255838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7255954Z raise RuntimeError(msg) 2025-12-04T09:19:34.7257549Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.7257557Z 2025-12-04T09:19:34.7257790Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7258647Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7258652Z 2025-12-04T09:19:34.7258929Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7258935Z 2025-12-04T09:19:34.7259103Z Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.7259224Z Traceback (most recent call last): 2025-12-04T09:19:34.7259795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7259908Z getattr(self, test_name)() 2025-12-04T09:19:34.7260458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7260550Z fn() 2025-12-04T09:19:34.7261059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7261176Z method(*args, **kwargs) 2025-12-04T09:19:34.7261685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7261788Z method(*args, **kwargs) 2025-12-04T09:19:34.7262303Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7262455Z with policy(): 2025-12-04T09:19:34.7262979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7263087Z raise RuntimeError(msg) 2025-12-04T09:19:34.7264429Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:19:34.7264435Z 2025-12-04T09:19:34.7264661Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7265509Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7265515Z 2025-12-04T09:19:34.7265790Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7265799Z 2025-12-04T09:19:34.7265962Z Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.7266079Z Traceback (most recent call last): 2025-12-04T09:19:34.7266643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7266756Z getattr(self, test_name)() 2025-12-04T09:19:34.7267305Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7267396Z fn() 2025-12-04T09:19:34.7267903Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7268020Z method(*args, **kwargs) 2025-12-04T09:19:34.7268532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7268636Z method(*args, **kwargs) 2025-12-04T09:19:34.7269223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7269313Z with policy(): 2025-12-04T09:19:34.7269820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7269919Z raise RuntimeError(msg) 2025-12-04T09:19:34.7271109Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 586088448 and is now 630128640. 2025-12-04T09:19:34.7271126Z 2025-12-04T09:19:34.7271320Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7272071Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7272079Z 2025-12-04T09:19:34.7272326Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7272492Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.7272664Z ======================= 1 failed, 7 deselected in 9.12s ======================== 2025-12-04T09:19:34.7272749Z Got exit code 1 2025-12-04T09:19:34.7273437Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7273809Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:19:34.7274414Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-94627d53ab92538d.xml 2025-12-04T09:19:34.7274609Z ============================= test session starts ============================== 2025-12-04T09:19:34.7274929Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.7275020Z cachedir: .pytest_cache 2025-12-04T09:19:34.7275484Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.7275591Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.7275681Z configfile: pytest.ini 2025-12-04T09:19:34.7276166Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.7276349Z collecting ... collected 8 items / 4 deselected / 4 selected 2025-12-04T09:19:34.7276474Z stepcurrent: skipping 4 already run items. 2025-12-04T09:19:34.7276577Z Running 4 items in this shard 2025-12-04T09:19:34.7276582Z 2025-12-04T09:19:34.7277652Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda I1204 09:16:55.474000 23337 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 23389 2025-12-04T09:19:34.7278110Z I1204 09:16:55.475000 23337 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 23390 2025-12-04T09:19:34.7278545Z I1204 09:16:55.476000 23337 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 23391 2025-12-04T09:19:34.7278984Z I1204 09:16:55.477000 23337 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 23392 2025-12-04T09:19:34.7280515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7280667Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7282242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7282389Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7283922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7284073Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7285594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7285738Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7286154Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7286633Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7287580Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7288043Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7288928Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7289288Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7290145Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7290590Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7291442Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7291880Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7292744Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7293138Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7293997Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7294494Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7296100Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.7296519Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7297340Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7298666Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7299034Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7299765Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7300313Z [rank0]:E1204 09:17:02.215000 23389 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.7300846Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7301377Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7302388Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7302911Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7303898Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7304305Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7305273Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7305772Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7306740Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7307230Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7308191Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7308745Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7309829Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7310291Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7312000Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:19:34.7312349Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7312971Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7314213Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7314558Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7315235Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7315803Z [rank2]:E1204 09:17:02.220000 23391 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.7316236Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7316912Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7317881Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7318383Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7319350Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7319743Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7320678Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7321300Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7322418Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7322909Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7323972Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7324420Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7325393Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7325881Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7327703Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:19:34.7328066Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7328735Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7330039Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7330471Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7331203Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7331749Z [rank1]:E1204 09:17:02.220000 23390 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.7332209Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7332843Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7333921Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7334414Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7335344Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7335729Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7336864Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7337358Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7338323Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7338864Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7339835Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7340281Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7341256Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7341754Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7343564Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 491716608 and is now 630128640. 2025-12-04T09:19:34.7343931Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7344596Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7345955Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7346324Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7347050Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7347601Z [rank3]:E1204 09:17:02.222000 23392 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.7347710Z dist init r=1, world=4 2025-12-04T09:19:34.7347807Z dist init r=0, world=4 2025-12-04T09:19:34.7347907Z dist init r=3, world=4 2025-12-04T09:19:34.7348012Z dist init r=2, world=4 2025-12-04T09:19:34.7349361Z [rank0]:[W1204 09:17:02.233958987 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:19:34.7349469Z FAILED [8.2942s] [ 25%] 2025-12-04T09:19:34.7349475Z 2025-12-04T09:19:34.7349610Z =================================== FAILURES =================================== 2025-12-04T09:19:34.7350025Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda _ 2025-12-04T09:19:34.7350155Z Traceback (most recent call last): 2025-12-04T09:19:34.7350672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.7350779Z self._join_processes(fn) 2025-12-04T09:19:34.7351337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.7351474Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.7352053Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.7352230Z raise RuntimeError(error) 2025-12-04T09:19:34.7352452Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.7352570Z Traceback (most recent call last): 2025-12-04T09:19:34.7353078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7353181Z getattr(self, test_name)() 2025-12-04T09:19:34.7353692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7353773Z fn() 2025-12-04T09:19:34.7354264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7354363Z method(*args, **kwargs) 2025-12-04T09:19:34.7354841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7354951Z method(*args, **kwargs) 2025-12-04T09:19:34.7355529Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7355613Z with policy(): 2025-12-04T09:19:34.7356081Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7356177Z raise RuntimeError(msg) 2025-12-04T09:19:34.7357378Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:19:34.7363826Z 2025-12-04T09:19:34.7364080Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7364859Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7364864Z 2025-12-04T09:19:34.7365110Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7365115Z 2025-12-04T09:19:34.7365119Z 2025-12-04T09:19:34.7365317Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.7365557Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.7366328Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-94627d53ab92538d.xml - 2025-12-04T09:19:34.7366485Z =========================== short test summary info ============================ 2025-12-04T09:19:34.7367399Z FAILED [8.2942s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.7367508Z Traceback (most recent call last): 2025-12-04T09:19:34.7368009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7368109Z getattr(self, test_name)() 2025-12-04T09:19:34.7368588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7368673Z fn() 2025-12-04T09:19:34.7369123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7369227Z method(*args, **kwargs) 2025-12-04T09:19:34.7369672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7369764Z method(*args, **kwargs) 2025-12-04T09:19:34.7370277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7370365Z with policy(): 2025-12-04T09:19:34.7370821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7370931Z raise RuntimeError(msg) 2025-12-04T09:19:34.7372125Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:19:34.7372135Z 2025-12-04T09:19:34.7372334Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7373100Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7373105Z 2025-12-04T09:19:34.7373350Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7373508Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.7373664Z ======================= 1 failed, 4 deselected in 8.32s ======================== 2025-12-04T09:19:34.7373760Z Got exit code 1 2025-12-04T09:19:34.7373852Z Retrying single test... 2025-12-04T09:19:34.7374458Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f49c40cee39994b2.xml 2025-12-04T09:19:34.7374604Z ============================= test session starts ============================== 2025-12-04T09:19:34.7374974Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.7375076Z cachedir: .pytest_cache 2025-12-04T09:19:34.7375537Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.7375646Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.7375745Z configfile: pytest.ini 2025-12-04T09:19:34.7376333Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.7376526Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.7377613Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7377730Z Running 1 items in this shard 2025-12-04T09:19:34.7377736Z 2025-12-04T09:19:34.7378955Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda I1204 09:17:08.664000 23674 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 23726 2025-12-04T09:19:34.7379453Z I1204 09:17:08.665000 23674 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 23727 2025-12-04T09:19:34.7379956Z I1204 09:17:08.666000 23674 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 23728 2025-12-04T09:19:34.7380443Z I1204 09:17:08.666000 23674 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 23729 2025-12-04T09:19:34.7382167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7382408Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7384130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7384304Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7386006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7386182Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7387886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7388055Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7388622Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7389281Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7390183Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7390639Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7391526Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7391880Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7392740Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7393180Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7394028Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7394465Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7395320Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7395732Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7396642Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7397089Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7398685Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.7399020Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7399600Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7400764Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7401093Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7401731Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7402217Z [rank0]:E1204 09:17:15.370000 23726 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.7402685Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7403157Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7404053Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7404506Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7405388Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7405746Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7406610Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7407040Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7407889Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7408326Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7409183Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7409639Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7410494Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7410933Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7412543Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 598671360 and is now 630128640. 2025-12-04T09:19:34.7412881Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7413464Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7414630Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7414963Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7415646Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7416144Z [rank1]:E1204 09:17:15.376000 23727 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.7416788Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7417317Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7418327Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7418838Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7419845Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7420243Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7421420Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7421914Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7422874Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7423378Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7424510Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7424967Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7425932Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7426432Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7428242Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:19:34.7428615Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7429273Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7430576Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7431018Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7431738Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7432288Z [rank3]:E1204 09:17:15.376000 23729 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.7432846Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7433319Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7434212Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7434668Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7435553Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7435908Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7436770Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7437204Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7438109Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7438546Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7439396Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7439801Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7440659Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7441106Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7442698Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 609157120 and is now 630128640. 2025-12-04T09:19:34.7443030Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7443613Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7444825Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7445154Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7445787Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7446282Z [rank2]:E1204 09:17:15.377000 23728 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.7446370Z dist init r=3, world=4 2025-12-04T09:19:34.7446462Z dist init r=2, world=4 2025-12-04T09:19:34.7446557Z dist init r=1, world=4 2025-12-04T09:19:34.7446641Z dist init r=0, world=4 2025-12-04T09:19:34.7447681Z [rank0]:[W1204 09:17:15.411024733 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:19:34.7447766Z FAILED [8.6960s] [100%] 2025-12-04T09:19:34.7447772Z 2025-12-04T09:19:34.7447904Z =================================== FAILURES =================================== 2025-12-04T09:19:34.7448302Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda _ 2025-12-04T09:19:34.7448406Z Traceback (most recent call last): 2025-12-04T09:19:34.7448900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.7448998Z self._join_processes(fn) 2025-12-04T09:19:34.7449525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.7449663Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.7450246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.7450354Z raise RuntimeError(error) 2025-12-04T09:19:34.7450568Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:19:34.7450674Z Traceback (most recent call last): 2025-12-04T09:19:34.7451158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7451255Z getattr(self, test_name)() 2025-12-04T09:19:34.7451729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7451823Z fn() 2025-12-04T09:19:34.7452270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7452363Z method(*args, **kwargs) 2025-12-04T09:19:34.7452818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7452911Z method(*args, **kwargs) 2025-12-04T09:19:34.7453361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7453447Z with policy(): 2025-12-04T09:19:34.7453896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7453997Z raise RuntimeError(msg) 2025-12-04T09:19:34.7455191Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 609157120 and is now 630128640. 2025-12-04T09:19:34.7455258Z 2025-12-04T09:19:34.7455456Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7456294Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7456300Z 2025-12-04T09:19:34.7456538Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7456551Z 2025-12-04T09:19:34.7456554Z 2025-12-04T09:19:34.7456937Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.7457200Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.7458081Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f49c40cee39994b2.xml - 2025-12-04T09:19:34.7458252Z =========================== short test summary info ============================ 2025-12-04T09:19:34.7459284Z FAILED [8.6960s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:19:34.7459405Z Traceback (most recent call last): 2025-12-04T09:19:34.7459964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7460084Z getattr(self, test_name)() 2025-12-04T09:19:34.7460617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7460705Z fn() 2025-12-04T09:19:34.7461224Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7461331Z method(*args, **kwargs) 2025-12-04T09:19:34.7461843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7462004Z method(*args, **kwargs) 2025-12-04T09:19:34.7462506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7462609Z with policy(): 2025-12-04T09:19:34.7463116Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7463224Z raise RuntimeError(msg) 2025-12-04T09:19:34.7464585Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 609157120 and is now 630128640. 2025-12-04T09:19:34.7464595Z 2025-12-04T09:19:34.7464810Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7465665Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7465671Z 2025-12-04T09:19:34.7465933Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7466120Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.7466295Z ======================= 1 failed, 7 deselected in 8.72s ======================== 2025-12-04T09:19:34.7466393Z Got exit code 1 2025-12-04T09:19:34.7466503Z Retrying single test... 2025-12-04T09:19:34.7467185Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-a8869f6ed51873ac.xml 2025-12-04T09:19:34.7467404Z ============================= test session starts ============================== 2025-12-04T09:19:34.7467758Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.7467869Z cachedir: .pytest_cache 2025-12-04T09:19:34.7468389Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.7468615Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.7468717Z configfile: pytest.ini 2025-12-04T09:19:34.7469330Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.7469512Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.7470343Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7470447Z Running 1 items in this shard 2025-12-04T09:19:34.7470452Z 2025-12-04T09:19:34.7471530Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda I1204 09:17:21.964000 24011 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 24063 2025-12-04T09:19:34.7471982Z I1204 09:17:21.965000 24011 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 24064 2025-12-04T09:19:34.7472421Z I1204 09:17:21.966000 24011 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 24065 2025-12-04T09:19:34.7472868Z I1204 09:17:21.967000 24011 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 24066 2025-12-04T09:19:34.7474443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7474605Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7476117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7476263Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7477777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7477922Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7479440Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7479582Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7480000Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7480525Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7481422Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7481879Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7482754Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7483112Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7483972Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7484422Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7485274Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7485702Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7486561Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7486960Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7487875Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7488312Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7489924Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.7490252Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7490850Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7492010Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7492334Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7492980Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7493511Z [rank0]:E1204 09:17:28.795000 24063 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.7493924Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7494394Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7495293Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7495743Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7496864Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7497276Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7498243Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7498732Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7499693Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7500190Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7501533Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7501986Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7502959Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7503448Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7505260Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 485425152 and is now 630128640. 2025-12-04T09:19:34.7505626Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7506301Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7507601Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7508024Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7508852Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7509452Z [rank3]:E1204 09:17:28.800000 24066 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.7509856Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7510321Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7511215Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7511672Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7512546Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7512900Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7513749Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7514181Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7515036Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7515525Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7516375Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7516771Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7517633Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7518070Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7519678Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:19:34.7519999Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7520589Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7522139Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7522609Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7523332Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7523873Z [rank1]:E1204 09:17:28.801000 24064 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.7524328Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7524857Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7525875Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7526380Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7527362Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7527766Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7528723Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7529221Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7530255Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7530753Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7531715Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7532162Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7533141Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7533732Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7535475Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:19:34.7535798Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7536512Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7537984Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7538349Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7539074Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7539618Z [rank2]:E1204 09:17:28.802000 24065 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.7539734Z dist init r=0, world=4 2025-12-04T09:19:34.7539832Z dist init r=3, world=4 2025-12-04T09:19:34.7539927Z dist init r=2, world=4 2025-12-04T09:19:34.7540033Z dist init r=1, world=4 2025-12-04T09:19:34.7541192Z [rank0]:[W1204 09:17:29.808444441 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:19:34.7541303Z FAILED [8.3955s] [100%] 2025-12-04T09:19:34.7541309Z 2025-12-04T09:19:34.7541455Z =================================== FAILURES =================================== 2025-12-04T09:19:34.7541898Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda _ 2025-12-04T09:19:34.7542024Z Traceback (most recent call last): 2025-12-04T09:19:34.7542570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.7542694Z self._join_processes(fn) 2025-12-04T09:19:34.7543280Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.7543478Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.7544093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.7544210Z raise RuntimeError(error) 2025-12-04T09:19:34.7544442Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.7544570Z Traceback (most recent call last): 2025-12-04T09:19:34.7545114Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7545221Z getattr(self, test_name)() 2025-12-04T09:19:34.7545759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7545850Z fn() 2025-12-04T09:19:34.7546358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7546470Z method(*args, **kwargs) 2025-12-04T09:19:34.7546969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7547078Z method(*args, **kwargs) 2025-12-04T09:19:34.7547578Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7547674Z with policy(): 2025-12-04T09:19:34.7548188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7548294Z raise RuntimeError(msg) 2025-12-04T09:19:34.7549808Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 485425152 and is now 630128640. 2025-12-04T09:19:34.7549824Z 2025-12-04T09:19:34.7550016Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7550772Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7550777Z 2025-12-04T09:19:34.7551021Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7551025Z 2025-12-04T09:19:34.7551029Z 2025-12-04T09:19:34.7551223Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.7551462Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.7552229Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-a8869f6ed51873ac.xml - 2025-12-04T09:19:34.7552384Z =========================== short test summary info ============================ 2025-12-04T09:19:34.7553284Z FAILED [8.3955s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.7553391Z Traceback (most recent call last): 2025-12-04T09:19:34.7553886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7553984Z getattr(self, test_name)() 2025-12-04T09:19:34.7554459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7554550Z fn() 2025-12-04T09:19:34.7554997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7555093Z method(*args, **kwargs) 2025-12-04T09:19:34.7555601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7555693Z method(*args, **kwargs) 2025-12-04T09:19:34.7556144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7556229Z with policy(): 2025-12-04T09:19:34.7556678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7556777Z raise RuntimeError(msg) 2025-12-04T09:19:34.7557969Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 485425152 and is now 630128640. 2025-12-04T09:19:34.7557978Z 2025-12-04T09:19:34.7558175Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7558927Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7558932Z 2025-12-04T09:19:34.7559174Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7559333Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.7559491Z ======================= 1 failed, 7 deselected in 8.42s ======================== 2025-12-04T09:19:34.7559580Z Got exit code 1 2025-12-04T09:19:34.7560259Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda 2025-12-04T09:19:34.7560671Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:19:34.7561290Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-90a4ba7c1fd04d10.xml 2025-12-04T09:19:34.7561434Z ============================= test session starts ============================== 2025-12-04T09:19:34.7561752Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.7561845Z cachedir: .pytest_cache 2025-12-04T09:19:34.7562299Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.7562415Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.7562510Z configfile: pytest.ini 2025-12-04T09:19:34.7562993Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.7563171Z collecting ... collected 8 items / 5 deselected / 3 selected 2025-12-04T09:19:34.7563296Z stepcurrent: skipping 5 already run items. 2025-12-04T09:19:34.7563400Z Running 3 items in this shard 2025-12-04T09:19:34.7563404Z 2025-12-04T09:19:34.7564485Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda I1204 09:17:35.174000 24348 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 24400 2025-12-04T09:19:34.7564933Z I1204 09:17:35.175000 24348 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 24401 2025-12-04T09:19:34.7565367Z I1204 09:17:35.176000 24348 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 24402 2025-12-04T09:19:34.7565801Z I1204 09:17:35.176000 24348 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 24403 2025-12-04T09:19:34.7567380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7567527Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7569044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7569192Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7570723Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7570864Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7572382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7572577Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7572982Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7573464Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7574355Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7574812Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7575688Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7576044Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7577209Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7577698Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7578661Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7579145Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7580174Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7580618Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7581576Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7582068Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7583860Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:19:34.7584242Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7584898Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7586208Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7586569Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7587339Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7587885Z [rank1]:E1204 09:17:41.976000 24401 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.7588332Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7589042Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7590009Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7590511Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7591475Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7591863Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7592794Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7593266Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7594199Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7594727Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7595660Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7596088Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7597021Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7597507Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7599254Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:19:34.7599610Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7600249Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7601523Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7601963Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7602668Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7603194Z [rank2]:E1204 09:17:41.978000 24402 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.7603725Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7604227Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7605253Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7605712Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7606781Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7607158Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7608061Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7608523Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7609495Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7609953Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7610856Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7611273Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7612182Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7612643Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7614337Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.7614684Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7615353Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7616839Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7617201Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7617920Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7618461Z [rank0]:E1204 09:17:41.978000 24400 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.7618914Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7619452Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7620454Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7621184Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7622176Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7622584Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7623642Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7624134Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7625096Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7625581Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7626549Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7626996Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7627967Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7628457Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7630257Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 630128640. 2025-12-04T09:19:34.7630700Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7631361Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7632772Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7633209Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7633847Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7634331Z [rank3]:E1204 09:17:41.979000 24403 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.7634419Z dist init r=1, world=4 2025-12-04T09:19:34.7634512Z dist init r=3, world=4 2025-12-04T09:19:34.7634595Z dist init r=0, world=4 2025-12-04T09:19:34.7634678Z dist init r=2, world=4 2025-12-04T09:19:34.7635705Z [rank0]:[W1204 09:17:42.995431379 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:19:34.7635790Z FAILED [8.3118s] [ 33%] 2025-12-04T09:19:34.7635796Z 2025-12-04T09:19:34.7635928Z =================================== FAILURES =================================== 2025-12-04T09:19:34.7636318Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda _ 2025-12-04T09:19:34.7636425Z Traceback (most recent call last): 2025-12-04T09:19:34.7636914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.7637112Z self._join_processes(fn) 2025-12-04T09:19:34.7637635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.7637758Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.7638301Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.7638402Z raise RuntimeError(error) 2025-12-04T09:19:34.7638608Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.7638715Z Traceback (most recent call last): 2025-12-04T09:19:34.7639196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7639290Z getattr(self, test_name)() 2025-12-04T09:19:34.7639774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7639852Z fn() 2025-12-04T09:19:34.7640299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7640392Z method(*args, **kwargs) 2025-12-04T09:19:34.7640840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7640933Z method(*args, **kwargs) 2025-12-04T09:19:34.7641378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7641460Z with policy(): 2025-12-04T09:19:34.7641963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7642058Z raise RuntimeError(msg) 2025-12-04T09:19:34.7643252Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:19:34.7643263Z 2025-12-04T09:19:34.7643452Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7644213Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7644218Z 2025-12-04T09:19:34.7644456Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7644465Z 2025-12-04T09:19:34.7644469Z 2025-12-04T09:19:34.7644663Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.7644900Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.7645667Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-90a4ba7c1fd04d10.xml - 2025-12-04T09:19:34.7645814Z =========================== short test summary info ============================ 2025-12-04T09:19:34.7646718Z FAILED [8.3118s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.7646824Z Traceback (most recent call last): 2025-12-04T09:19:34.7647316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7647415Z getattr(self, test_name)() 2025-12-04T09:19:34.7647888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7647969Z fn() 2025-12-04T09:19:34.7648467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7648562Z method(*args, **kwargs) 2025-12-04T09:19:34.7649005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7649094Z method(*args, **kwargs) 2025-12-04T09:19:34.7649543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7649625Z with policy(): 2025-12-04T09:19:34.7650070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7650171Z raise RuntimeError(msg) 2025-12-04T09:19:34.7651370Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:19:34.7651375Z 2025-12-04T09:19:34.7651567Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7652323Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7652328Z 2025-12-04T09:19:34.7652562Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7652720Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.7652937Z ======================= 1 failed, 5 deselected in 8.33s ======================== 2025-12-04T09:19:34.7653026Z Got exit code 1 2025-12-04T09:19:34.7653117Z Retrying single test... 2025-12-04T09:19:34.7653730Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ccaa5b3b6bf09af7.xml 2025-12-04T09:19:34.7653880Z ============================= test session starts ============================== 2025-12-04T09:19:34.7654189Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.7654290Z cachedir: .pytest_cache 2025-12-04T09:19:34.7654750Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.7654863Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.7654968Z configfile: pytest.ini 2025-12-04T09:19:34.7655452Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.7655634Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.7656542Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7656814Z Running 1 items in this shard 2025-12-04T09:19:34.7656820Z 2025-12-04T09:19:34.7658044Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda I1204 09:17:48.454000 24685 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 24737 2025-12-04T09:19:34.7658542Z I1204 09:17:48.454000 24685 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 24738 2025-12-04T09:19:34.7659046Z I1204 09:17:48.455000 24685 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 24739 2025-12-04T09:19:34.7659540Z I1204 09:17:48.456000 24685 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 24740 2025-12-04T09:19:34.7661320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7661500Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7663209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7663386Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7665103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7665275Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7666973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7667201Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7667671Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7668209Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7669292Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7669749Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7670639Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7670997Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7671860Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7672292Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7673151Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7673599Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7674500Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7674911Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7675764Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7676217Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7677819Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:19:34.7678144Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7678741Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7679908Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7680298Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7680942Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7681435Z [rank1]:E1204 09:17:55.270000 24738 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.7681836Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7682309Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7683209Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7683669Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7684559Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7684914Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7685774Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7686207Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7687060Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7687553Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7688405Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7688815Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7689671Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7690119Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7691720Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.7692055Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7692638Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7693855Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7694191Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7694835Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7695330Z [rank0]:E1204 09:17:55.270000 24737 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.7695734Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7696434Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7697609Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7698122Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7699124Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7699528Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7700512Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7701081Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7702046Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7702547Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7703504Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7703970Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7704938Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7705445Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7707243Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:19:34.7707673Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7708340Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7709818Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7710155Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7710793Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7711293Z [rank2]:E1204 09:17:55.272000 24739 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.7711700Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7712176Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7713074Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7713529Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7714417Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7714777Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7715697Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7716133Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7716983Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7717429Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7718280Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7718691Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7719552Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7719999Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7721968Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 630128640. 2025-12-04T09:19:34.7722456Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7723122Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7724432Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7724808Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7725526Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7726091Z [rank3]:E1204 09:17:55.273000 24740 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.7726196Z dist init r=3, world=4 2025-12-04T09:19:34.7726293Z dist init r=1, world=4 2025-12-04T09:19:34.7726405Z dist init r=0, world=4 2025-12-04T09:19:34.7726500Z dist init r=2, world=4 2025-12-04T09:19:34.7727677Z [rank0]:[W1204 09:17:55.287790420 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:19:34.7727778Z FAILED [8.2907s] [100%] 2025-12-04T09:19:34.7727784Z 2025-12-04T09:19:34.7727938Z =================================== FAILURES =================================== 2025-12-04T09:19:34.7728393Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda _ 2025-12-04T09:19:34.7728515Z Traceback (most recent call last): 2025-12-04T09:19:34.7729136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.7729260Z self._join_processes(fn) 2025-12-04T09:19:34.7729847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.7730001Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.7730608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.7730722Z raise RuntimeError(error) 2025-12-04T09:19:34.7730968Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.7731092Z Traceback (most recent call last): 2025-12-04T09:19:34.7731648Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7731757Z getattr(self, test_name)() 2025-12-04T09:19:34.7732298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7732396Z fn() 2025-12-04T09:19:34.7732908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7733013Z method(*args, **kwargs) 2025-12-04T09:19:34.7733639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7733732Z method(*args, **kwargs) 2025-12-04T09:19:34.7734195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7734334Z with policy(): 2025-12-04T09:19:34.7734791Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7734898Z raise RuntimeError(msg) 2025-12-04T09:19:34.7736096Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 630128640. 2025-12-04T09:19:34.7736102Z 2025-12-04T09:19:34.7736366Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7737365Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7737375Z 2025-12-04T09:19:34.7737641Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7737659Z 2025-12-04T09:19:34.7737663Z 2025-12-04T09:19:34.7737885Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.7738151Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.7739037Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ccaa5b3b6bf09af7.xml - 2025-12-04T09:19:34.7739210Z =========================== short test summary info ============================ 2025-12-04T09:19:34.7740238Z FAILED [8.2907s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.7740358Z Traceback (most recent call last): 2025-12-04T09:19:34.7740914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7741041Z getattr(self, test_name)() 2025-12-04T09:19:34.7741646Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7741739Z fn() 2025-12-04T09:19:34.7742265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7742368Z method(*args, **kwargs) 2025-12-04T09:19:34.7742879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7742983Z method(*args, **kwargs) 2025-12-04T09:19:34.7743485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7743592Z with policy(): 2025-12-04T09:19:34.7744103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7744208Z raise RuntimeError(msg) 2025-12-04T09:19:34.7745567Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 630128640. 2025-12-04T09:19:34.7745574Z 2025-12-04T09:19:34.7745785Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7746644Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7746650Z 2025-12-04T09:19:34.7746915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7747159Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.7747335Z ======================= 1 failed, 7 deselected in 8.31s ======================== 2025-12-04T09:19:34.7747431Z Got exit code 1 2025-12-04T09:19:34.7747544Z Retrying single test... 2025-12-04T09:19:34.7748238Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ca39f8152ef39349.xml 2025-12-04T09:19:34.7748397Z ============================= test session starts ============================== 2025-12-04T09:19:34.7748852Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.7749058Z cachedir: .pytest_cache 2025-12-04T09:19:34.7749526Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.7749638Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.7749733Z configfile: pytest.ini 2025-12-04T09:19:34.7750216Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.7750403Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.7751231Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7751332Z Running 1 items in this shard 2025-12-04T09:19:34.7751336Z 2025-12-04T09:19:34.7752411Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda I1204 09:18:01.694000 25022 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 25074 2025-12-04T09:19:34.7752863Z I1204 09:18:01.695000 25022 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 25075 2025-12-04T09:19:34.7753306Z I1204 09:18:01.696000 25022 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 25076 2025-12-04T09:19:34.7753812Z I1204 09:18:01.696000 25022 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 25077 2025-12-04T09:19:34.7755341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7755493Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7757009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7757159Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7758671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7758813Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7760331Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7760524Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7760939Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7761415Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7762302Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7762760Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7763644Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7763999Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7764851Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7765289Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7766141Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7766573Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7767474Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7767871Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7768729Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7769165Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7770768Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.7771096Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7771683Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7772845Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7773217Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7773861Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7774342Z [rank0]:E1204 09:18:08.463000 25074 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.7774745Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7775215Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7776109Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7776797Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7777793Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7778195Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7779152Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7779650Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7780684Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7781178Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7782133Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7782579Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7783553Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7784045Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7785852Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 493813760 and is now 630128640. 2025-12-04T09:19:34.7786215Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7786878Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7788242Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7788713Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7789484Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7789966Z [rank3]:E1204 09:18:08.463000 25077 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.7790370Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7790843Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7791742Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7792193Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7793066Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7793422Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7794274Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7794762Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7795615Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7796053Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7796899Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7797298Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7798159Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7798594Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7800201Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:19:34.7800587Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7801181Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7802344Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7802666Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7803304Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7803787Z [rank2]:E1204 09:18:08.463000 25076 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.7804196Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7804668Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7805560Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7806015Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7806891Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7807308Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7808163Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7808601Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7809450Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7809892Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7810744Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7811137Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7812000Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7812437Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7814056Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:19:34.7814429Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7815020Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7816172Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7816581Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7817478Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7818024Z [rank1]:E1204 09:18:08.463000 25075 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.7818134Z dist init r=1, world=4 2025-12-04T09:19:34.7818232Z dist init r=0, world=4 2025-12-04T09:19:34.7818327Z dist init r=2, world=4 2025-12-04T09:19:34.7818426Z dist init r=3, world=4 2025-12-04T09:19:34.7819580Z [rank0]:[W1204 09:18:08.478167312 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:19:34.7819690Z FAILED [8.9816s] [100%] 2025-12-04T09:19:34.7819695Z 2025-12-04T09:19:34.7819839Z =================================== FAILURES =================================== 2025-12-04T09:19:34.7820277Z _ TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda _ 2025-12-04T09:19:34.7820461Z Traceback (most recent call last): 2025-12-04T09:19:34.7821198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.7821319Z self._join_processes(fn) 2025-12-04T09:19:34.7821913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.7822053Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.7822671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.7822786Z raise RuntimeError(error) 2025-12-04T09:19:34.7823019Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:19:34.7823140Z Traceback (most recent call last): 2025-12-04T09:19:34.7823688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7823805Z getattr(self, test_name)() 2025-12-04T09:19:34.7824337Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7824422Z fn() 2025-12-04T09:19:34.7824932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7825033Z method(*args, **kwargs) 2025-12-04T09:19:34.7825533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7825741Z method(*args, **kwargs) 2025-12-04T09:19:34.7826249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7826351Z with policy(): 2025-12-04T09:19:34.7826864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7826968Z raise RuntimeError(msg) 2025-12-04T09:19:34.7828316Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.7828323Z 2025-12-04T09:19:34.7828538Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7829401Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7829411Z 2025-12-04T09:19:34.7829677Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7829682Z 2025-12-04T09:19:34.7829853Z Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.7829970Z Traceback (most recent call last): 2025-12-04T09:19:34.7830519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7830635Z getattr(self, test_name)() 2025-12-04T09:19:34.7831170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7831256Z fn() 2025-12-04T09:19:34.7831767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7831873Z method(*args, **kwargs) 2025-12-04T09:19:34.7832382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7832485Z method(*args, **kwargs) 2025-12-04T09:19:34.7833151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7833251Z with policy(): 2025-12-04T09:19:34.7833727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7833827Z raise RuntimeError(msg) 2025-12-04T09:19:34.7835101Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:19:34.7835110Z 2025-12-04T09:19:34.7835471Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7836301Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7836311Z 2025-12-04T09:19:34.7836564Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7836570Z 2025-12-04T09:19:34.7836732Z Process 2 exited with error code 10 and exception: 2025-12-04T09:19:34.7836843Z Traceback (most recent call last): 2025-12-04T09:19:34.7837373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7837482Z getattr(self, test_name)() 2025-12-04T09:19:34.7837998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7838192Z fn() 2025-12-04T09:19:34.7838696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7838793Z method(*args, **kwargs) 2025-12-04T09:19:34.7839289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7839390Z method(*args, **kwargs) 2025-12-04T09:19:34.7839877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7839979Z with policy(): 2025-12-04T09:19:34.7840475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7840577Z raise RuntimeError(msg) 2025-12-04T09:19:34.7841979Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:19:34.7841988Z 2025-12-04T09:19:34.7842187Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7842991Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7842996Z 2025-12-04T09:19:34.7843243Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7843248Z 2025-12-04T09:19:34.7843252Z 2025-12-04T09:19:34.7843461Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.7843710Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.7844515Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ca39f8152ef39349.xml - 2025-12-04T09:19:34.7844687Z =========================== short test summary info ============================ 2025-12-04T09:19:34.7846009Z FAILED [8.9816s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:19:34.7846128Z Traceback (most recent call last): 2025-12-04T09:19:34.7846648Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7846751Z getattr(self, test_name)() 2025-12-04T09:19:34.7847261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7847344Z fn() 2025-12-04T09:19:34.7847831Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7847934Z method(*args, **kwargs) 2025-12-04T09:19:34.7848409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7848515Z method(*args, **kwargs) 2025-12-04T09:19:34.7848985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7849079Z with policy(): 2025-12-04T09:19:34.7849562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7849664Z raise RuntimeError(msg) 2025-12-04T09:19:34.7850934Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3072 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:19:34.7851007Z 2025-12-04T09:19:34.7851208Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7852013Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7852019Z 2025-12-04T09:19:34.7852267Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7852272Z 2025-12-04T09:19:34.7852424Z Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.7852542Z Traceback (most recent call last): 2025-12-04T09:19:34.7853057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7853162Z getattr(self, test_name)() 2025-12-04T09:19:34.7853661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7853744Z fn() 2025-12-04T09:19:34.7854221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7854325Z method(*args, **kwargs) 2025-12-04T09:19:34.7854795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7854896Z method(*args, **kwargs) 2025-12-04T09:19:34.7855362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7855456Z with policy(): 2025-12-04T09:19:34.7855931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7856031Z raise RuntimeError(msg) 2025-12-04T09:19:34.7857616Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:19:34.7857689Z 2025-12-04T09:19:34.7857902Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7858757Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7858762Z 2025-12-04T09:19:34.7859023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7859028Z 2025-12-04T09:19:34.7859194Z Process 2 exited with error code 10 and exception: 2025-12-04T09:19:34.7859308Z Traceback (most recent call last): 2025-12-04T09:19:34.7859850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7859971Z getattr(self, test_name)() 2025-12-04T09:19:34.7860502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7860595Z fn() 2025-12-04T09:19:34.7861104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7861203Z method(*args, **kwargs) 2025-12-04T09:19:34.7861710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7861809Z method(*args, **kwargs) 2025-12-04T09:19:34.7862311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7862412Z with policy(): 2025-12-04T09:19:34.7862919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7863080Z raise RuntimeError(msg) 2025-12-04T09:19:34.7864452Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:19:34.7864459Z 2025-12-04T09:19:34.7864668Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7865522Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7865527Z 2025-12-04T09:19:34.7865787Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7865971Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.7866149Z ======================= 1 failed, 7 deselected in 9.00s ======================== 2025-12-04T09:19:34.7866242Z Got exit code 1 2025-12-04T09:19:34.7867021Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda 2025-12-04T09:19:34.7867425Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:19:34.7868099Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-7178045a44a28781.xml 2025-12-04T09:19:34.7868266Z ============================= test session starts ============================== 2025-12-04T09:19:34.7868700Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.7868797Z cachedir: .pytest_cache 2025-12-04T09:19:34.7869252Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.7869359Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.7869458Z configfile: pytest.ini 2025-12-04T09:19:34.7869984Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.7870171Z collecting ... collected 8 items / 6 deselected / 2 selected 2025-12-04T09:19:34.7870290Z stepcurrent: skipping 6 already run items. 2025-12-04T09:19:34.7870387Z Running 2 items in this shard 2025-12-04T09:19:34.7870391Z 2025-12-04T09:19:34.7871303Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda I1204 09:18:14.894000 25359 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 25411 2025-12-04T09:19:34.7871738Z I1204 09:18:14.894000 25359 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 25412 2025-12-04T09:19:34.7872183Z I1204 09:18:14.895000 25359 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 25413 2025-12-04T09:19:34.7872617Z I1204 09:18:14.896000 25359 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 25414 2025-12-04T09:19:34.7874155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7874310Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7875841Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7876044Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7877558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7877706Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7879207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7879363Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7879769Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7880243Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7881134Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7881580Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7882693Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7883064Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7883972Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7884430Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7885328Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7885793Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7886698Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7887125Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7888032Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7888498Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7890067Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696. 2025-12-04T09:19:34.7890410Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7891210Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7892283Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.7892641Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7893335Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7893867Z [rank0]:E1204 09:18:21.732000 25411 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.7894301Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7894811Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7895794Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7896353Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7897578Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7897978Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7898940Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7899430Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7900400Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7900890Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7901852Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7902305Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7903263Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7903803Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7905427Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 632225792. 2025-12-04T09:19:34.7905788Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7906453Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7907561Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.7907935Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7908852Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7909370Z [rank1]:E1204 09:18:21.733000 25412 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.7909793Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7910291Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7911290Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7911768Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7912704Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7913073Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7913975Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7914443Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7915345Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7915806Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7916706Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7917124Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7918253Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7918727Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7920301Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 602865664 and is now 632225792. 2025-12-04T09:19:34.7920647Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7921605Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7922720Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.7923089Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7923804Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7924346Z [rank2]:E1204 09:18:21.734000 25413 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.7924810Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7925334Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7926436Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7926944Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7927938Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7928329Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7929292Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7929784Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7930742Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7931231Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7932186Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7932703Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7933765Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7934222Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7935744Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 632225792. 2025-12-04T09:19:34.7936088Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7936956Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7938069Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.7938435Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7939147Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7939694Z [rank3]:E1204 09:18:21.734000 25414 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.7939806Z dist init r=3, world=4 2025-12-04T09:19:34.7939903Z dist init r=0, world=4 2025-12-04T09:19:34.7940071Z dist init r=1, world=4 2025-12-04T09:19:34.7940167Z dist init r=2, world=4 2025-12-04T09:19:34.7940266Z FAILED [8.4909s] [ 50%] 2025-12-04T09:19:34.7940271Z 2025-12-04T09:19:34.7940432Z =================================== FAILURES =================================== 2025-12-04T09:19:34.7940727Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda _________ 2025-12-04T09:19:34.7940847Z Traceback (most recent call last): 2025-12-04T09:19:34.7941399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.7941510Z self._join_processes(fn) 2025-12-04T09:19:34.7942100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.7942243Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.7942847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.7942966Z raise RuntimeError(error) 2025-12-04T09:19:34.7943199Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.7943323Z Traceback (most recent call last): 2025-12-04T09:19:34.7943864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7943976Z getattr(self, test_name)() 2025-12-04T09:19:34.7944512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7944597Z fn() 2025-12-04T09:19:34.7945157Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7945265Z method(*args, **kwargs) 2025-12-04T09:19:34.7945774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7945885Z method(*args, **kwargs) 2025-12-04T09:19:34.7946389Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7946485Z with policy(): 2025-12-04T09:19:34.7947000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7947107Z raise RuntimeError(msg) 2025-12-04T09:19:34.7948278Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 632225792. 2025-12-04T09:19:34.7948294Z 2025-12-04T09:19:34.7948618Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7949337Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.7949342Z 2025-12-04T09:19:34.7949587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7949591Z 2025-12-04T09:19:34.7949596Z 2025-12-04T09:19:34.7949789Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.7950025Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.7950779Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-7178045a44a28781.xml - 2025-12-04T09:19:34.7950932Z =========================== short test summary info ============================ 2025-12-04T09:19:34.7951665Z FAILED [8.4909s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.7951836Z Traceback (most recent call last): 2025-12-04T09:19:34.7952332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7952429Z getattr(self, test_name)() 2025-12-04T09:19:34.7952905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7952990Z fn() 2025-12-04T09:19:34.7953440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7953532Z method(*args, **kwargs) 2025-12-04T09:19:34.7953985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7954081Z method(*args, **kwargs) 2025-12-04T09:19:34.7954535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7954618Z with policy(): 2025-12-04T09:19:34.7955064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7955163Z raise RuntimeError(msg) 2025-12-04T09:19:34.7956191Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 607059968 and is now 632225792. 2025-12-04T09:19:34.7956197Z 2025-12-04T09:19:34.7956395Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7957033Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.7957038Z 2025-12-04T09:19:34.7957271Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7957435Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.7957593Z ======================= 1 failed, 6 deselected in 8.51s ======================== 2025-12-04T09:19:34.7957688Z Got exit code 1 2025-12-04T09:19:34.7957779Z Retrying single test... 2025-12-04T09:19:34.7958382Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cdb7b80b8b392fad.xml 2025-12-04T09:19:34.7958534Z ============================= test session starts ============================== 2025-12-04T09:19:34.7958843Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.7958939Z cachedir: .pytest_cache 2025-12-04T09:19:34.7959403Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.7959507Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.7959609Z configfile: pytest.ini 2025-12-04T09:19:34.7960078Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.7960260Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.7960921Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.7961017Z Running 1 items in this shard 2025-12-04T09:19:34.7961021Z 2025-12-04T09:19:34.7961934Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda I1204 09:18:28.114000 25688 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 25740 2025-12-04T09:19:34.7962378Z I1204 09:18:28.114000 25688 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 25741 2025-12-04T09:19:34.7962866Z I1204 09:18:28.115000 25688 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 25742 2025-12-04T09:19:34.7963311Z I1204 09:18:28.116000 25688 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 25743 2025-12-04T09:19:34.7964837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7964993Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7966512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7966662Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7968173Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7968369Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7969876Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.7970017Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.7970426Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7970899Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7971787Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7972237Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7973130Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7973481Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7974331Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7974768Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7975670Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7976109Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7977243Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7977698Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7978657Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7979151Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7980781Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696. 2025-12-04T09:19:34.7981146Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7981805Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7982976Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.7983349Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7984067Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7984610Z [rank0]:E1204 09:18:34.897000 25740 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.7985068Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7985595Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.7986607Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.7987115Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.7988105Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.7988502Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.7989505Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7989942Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7990836Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.7991272Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.7992120Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.7992524Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.7993383Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.7993812Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.7995249Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 607059968 and is now 632225792. 2025-12-04T09:19:34.7995571Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7996223Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.7997377Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.7997720Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.7998390Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.7998900Z [rank2]:E1204 09:18:34.897000 25742 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.7999328Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.7999825Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8000779Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8001251Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8002191Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8002559Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8003517Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8003984Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8004884Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8005341Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8006242Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8006663Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8007578Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8008039Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8009560Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 632225792. 2025-12-04T09:19:34.8009954Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8010579Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8011686Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.8012013Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8012640Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8013122Z [rank1]:E1204 09:18:34.897000 25741 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.8013529Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8014003Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8014897Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8015342Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8016271Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8016803Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8017817Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8018305Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8019261Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8019756Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8020722Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8021344Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8022311Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8022798Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8024409Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 498008064 and is now 632225792. 2025-12-04T09:19:34.8024880Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8025545Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8026650Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.8027015Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8027731Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8028278Z [rank3]:E1204 09:18:34.898000 25743 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.8028386Z dist init r=0, world=4 2025-12-04T09:19:34.8028486Z dist init r=2, world=4 2025-12-04T09:19:34.8028580Z dist init r=1, world=4 2025-12-04T09:19:34.8028682Z dist init r=3, world=4 2025-12-04T09:19:34.8028777Z FAILED [8.5045s] [100%] 2025-12-04T09:19:34.8028783Z 2025-12-04T09:19:34.8028935Z =================================== FAILURES =================================== 2025-12-04T09:19:34.8029229Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda _________ 2025-12-04T09:19:34.8029348Z Traceback (most recent call last): 2025-12-04T09:19:34.8029905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.8030023Z self._join_processes(fn) 2025-12-04T09:19:34.8030611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.8030829Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.8031434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.8031553Z raise RuntimeError(error) 2025-12-04T09:19:34.8031784Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.8031903Z Traceback (most recent call last): 2025-12-04T09:19:34.8032445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8032555Z getattr(self, test_name)() 2025-12-04T09:19:34.8033178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8033269Z fn() 2025-12-04T09:19:34.8033749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8033860Z method(*args, **kwargs) 2025-12-04T09:19:34.8034332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8034428Z method(*args, **kwargs) 2025-12-04T09:19:34.8034908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8034998Z with policy(): 2025-12-04T09:19:34.8035485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8035583Z raise RuntimeError(msg) 2025-12-04T09:19:34.8036727Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 632225792. 2025-12-04T09:19:34.8036733Z 2025-12-04T09:19:34.8036946Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8037565Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.8037569Z 2025-12-04T09:19:34.8037821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8037826Z 2025-12-04T09:19:34.8037830Z 2025-12-04T09:19:34.8038038Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.8038283Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.8039105Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cdb7b80b8b392fad.xml - 2025-12-04T09:19:34.8039265Z =========================== short test summary info ============================ 2025-12-04T09:19:34.8040051Z FAILED [8.5045s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:19:34.8040163Z Traceback (most recent call last): 2025-12-04T09:19:34.8040682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8040791Z getattr(self, test_name)() 2025-12-04T09:19:34.8041295Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8041486Z fn() 2025-12-04T09:19:34.8041935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8042029Z method(*args, **kwargs) 2025-12-04T09:19:34.8042536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8042630Z method(*args, **kwargs) 2025-12-04T09:19:34.8043079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8043173Z with policy(): 2025-12-04T09:19:34.8043627Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8043731Z raise RuntimeError(msg) 2025-12-04T09:19:34.8044767Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 632225792. 2025-12-04T09:19:34.8044776Z 2025-12-04T09:19:34.8044965Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8045560Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.8045565Z 2025-12-04T09:19:34.8045798Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8045964Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.8046122Z ======================= 1 failed, 7 deselected in 8.53s ======================== 2025-12-04T09:19:34.8046206Z Got exit code 1 2025-12-04T09:19:34.8046307Z Retrying single test... 2025-12-04T09:19:34.8046910Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-9595731043617943.xml 2025-12-04T09:19:34.8047178Z ============================= test session starts ============================== 2025-12-04T09:19:34.8047486Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.8047578Z cachedir: .pytest_cache 2025-12-04T09:19:34.8048042Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.8048151Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.8048243Z configfile: pytest.ini 2025-12-04T09:19:34.8048723Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.8048905Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.8049564Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.8049665Z Running 1 items in this shard 2025-12-04T09:19:34.8049670Z 2025-12-04T09:19:34.8050588Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda I1204 09:18:41.314000 26017 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 26069 2025-12-04T09:19:34.8051040Z I1204 09:18:41.315000 26017 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 26070 2025-12-04T09:19:34.8051479Z I1204 09:18:41.316000 26017 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 26071 2025-12-04T09:19:34.8051925Z I1204 09:18:41.316000 26017 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 26072 2025-12-04T09:19:34.8053462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8053617Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8055180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8055330Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8057128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8057338Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8059047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8059209Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8059674Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8060208Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8061281Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8061788Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8062773Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8063170Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8064126Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8064624Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8065586Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8066077Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8067036Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8067476Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8068504Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8069097Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8070528Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696. 2025-12-04T09:19:34.8070850Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8071612Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8072656Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.8072994Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8073672Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8074185Z [rank0]:E1204 09:18:48.088000 26069 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.8074665Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8075163Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8076120Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8076596Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8077521Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8077897Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8078804Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8079265Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8080164Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8080624Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8081521Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8081940Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8082900Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8083478Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8084913Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 2. CUDA driver allocated memory was 607059968 and is now 632225792. 2025-12-04T09:19:34.8085236Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8085826Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8086808Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.8087124Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8087757Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8088295Z [rank2]:E1204 09:18:48.089000 26071 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.8088703Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8089171Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8090064Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8090511Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8091384Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8091742Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8092592Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8093025Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8093878Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8094304Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8095221Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8095613Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8096547Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8097201Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8098816Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 632225792. 2025-12-04T09:19:34.8099188Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8099850Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8100960Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.8101324Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8102111Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8102655Z [rank3]:E1204 09:18:48.089000 26072 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.8103115Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8103644Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8104646Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8105160Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8106156Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8106561Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8107517Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8108010Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8109055Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8109490Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8110391Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8110782Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8111646Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8112083Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8113521Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 1. CUDA driver allocated memory was 602865664 and is now 632225792. 2025-12-04T09:19:34.8113841Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8114425Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8115403Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.8115772Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8116416Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8116898Z [rank1]:E1204 09:18:48.089000 26070 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.8116994Z dist init r=2, world=4 2025-12-04T09:19:34.8117084Z dist init r=0, world=4 2025-12-04T09:19:34.8117171Z dist init r=1, world=4 2025-12-04T09:19:34.8117262Z dist init r=3, world=4 2025-12-04T09:19:34.8117344Z FAILED [8.6665s] [100%] 2025-12-04T09:19:34.8117349Z 2025-12-04T09:19:34.8117478Z =================================== FAILURES =================================== 2025-12-04T09:19:34.8117748Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda _________ 2025-12-04T09:19:34.8117855Z Traceback (most recent call last): 2025-12-04T09:19:34.8118344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.8118448Z self._join_processes(fn) 2025-12-04T09:19:34.8118963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.8119094Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.8119628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.8119727Z raise RuntimeError(error) 2025-12-04T09:19:34.8119938Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:19:34.8120047Z Traceback (most recent call last): 2025-12-04T09:19:34.8120533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8120633Z getattr(self, test_name)() 2025-12-04T09:19:34.8121482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8121582Z fn() 2025-12-04T09:19:34.8122087Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8122189Z method(*args, **kwargs) 2025-12-04T09:19:34.8122699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8122801Z method(*args, **kwargs) 2025-12-04T09:19:34.8123318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8123412Z with policy(): 2025-12-04T09:19:34.8123921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8124035Z raise RuntimeError(msg) 2025-12-04T09:19:34.8125201Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696. 2025-12-04T09:19:34.8125207Z 2025-12-04T09:19:34.8125426Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8126079Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.8126085Z 2025-12-04T09:19:34.8126345Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8126362Z 2025-12-04T09:19:34.8126523Z Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.8126738Z Traceback (most recent call last): 2025-12-04T09:19:34.8127291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8127404Z getattr(self, test_name)() 2025-12-04T09:19:34.8127944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8128041Z fn() 2025-12-04T09:19:34.8128547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8128661Z method(*args, **kwargs) 2025-12-04T09:19:34.8129163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8129266Z method(*args, **kwargs) 2025-12-04T09:19:34.8129772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8129874Z with policy(): 2025-12-04T09:19:34.8130382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8130500Z raise RuntimeError(msg) 2025-12-04T09:19:34.8131669Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 632225792. 2025-12-04T09:19:34.8131675Z 2025-12-04T09:19:34.8131894Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8132547Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.8132553Z 2025-12-04T09:19:34.8132814Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8132829Z 2025-12-04T09:19:34.8132834Z 2025-12-04T09:19:34.8133050Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.8133314Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.8134272Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-9595731043617943.xml - 2025-12-04T09:19:34.8134427Z =========================== short test summary info ============================ 2025-12-04T09:19:34.8135170Z FAILED [8.6665s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:19:34.8135276Z Traceback (most recent call last): 2025-12-04T09:19:34.8135759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8135868Z getattr(self, test_name)() 2025-12-04T09:19:34.8136406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8136485Z fn() 2025-12-04T09:19:34.8137157Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8137261Z method(*args, **kwargs) 2025-12-04T09:19:34.8137772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8137874Z method(*args, **kwargs) 2025-12-04T09:19:34.8138378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8138484Z with policy(): 2025-12-04T09:19:34.8138992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8139162Z raise RuntimeError(msg) 2025-12-04T09:19:34.8140327Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696. 2025-12-04T09:19:34.8140333Z 2025-12-04T09:19:34.8140544Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8141215Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.8141220Z 2025-12-04T09:19:34.8141482Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8141488Z 2025-12-04T09:19:34.8141656Z Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.8141780Z Traceback (most recent call last): 2025-12-04T09:19:34.8142324Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8142448Z getattr(self, test_name)() 2025-12-04T09:19:34.8142988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8143087Z fn() 2025-12-04T09:19:34.8143591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8143693Z method(*args, **kwargs) 2025-12-04T09:19:34.8144203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8144307Z method(*args, **kwargs) 2025-12-04T09:19:34.8144804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8144913Z with policy(): 2025-12-04T09:19:34.8145423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8145542Z raise RuntimeError(msg) 2025-12-04T09:19:34.8146769Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda! Caching allocator allocated memory was 512 and is now reported as 3584 on device 3. CUDA driver allocated memory was 487522304 and is now 632225792. 2025-12-04T09:19:34.8146776Z 2025-12-04T09:19:34.8146989Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8147651Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.8147656Z 2025-12-04T09:19:34.8147913Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8148099Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.8148277Z ======================= 1 failed, 7 deselected in 8.69s ======================== 2025-12-04T09:19:34.8148374Z Got exit code 1 2025-12-04T09:19:34.8149164Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda 2025-12-04T09:19:34.8149519Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:19:34.8150126Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f8bd87b046fcc0d3.xml 2025-12-04T09:19:34.8150269Z ============================= test session starts ============================== 2025-12-04T09:19:34.8150576Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.8150683Z cachedir: .pytest_cache 2025-12-04T09:19:34.8151135Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.8151293Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.8151393Z configfile: pytest.ini 2025-12-04T09:19:34.8151871Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.8152059Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.8152180Z stepcurrent: skipping 7 already run items. 2025-12-04T09:19:34.8152279Z Running 1 items in this shard 2025-12-04T09:19:34.8152283Z 2025-12-04T09:19:34.8153214Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda I1204 09:18:54.494000 26346 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 26398 2025-12-04T09:19:34.8153651Z I1204 09:18:54.495000 26346 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 26399 2025-12-04T09:19:34.8154098Z I1204 09:18:54.496000 26346 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 26400 2025-12-04T09:19:34.8154536Z I1204 09:18:54.497000 26346 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 26401 2025-12-04T09:19:34.8156057Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8156210Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8157725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8157932Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8159443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8159595Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8161109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8161264Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8161671Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8162141Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8163036Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8163486Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8164431Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8164787Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8165649Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8166083Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8166933Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8167376Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8168226Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8168635Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8169491Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8169940Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8171424Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696. 2025-12-04T09:19:34.8171754Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8172348Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8173332Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8173664Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8174300Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8174790Z [rank0]:E1204 09:19:01.223000 26398 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.8175190Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8175658Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8176795Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8177646Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8178645Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8179046Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8180023Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8180512Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8181477Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8181976Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8182932Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8183387Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8184346Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8184848Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8186517Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 2. CUDA driver allocated memory was 602865664 and is now 632225792. 2025-12-04T09:19:34.8186880Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8187551Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8188651Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8189123Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8189757Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8190245Z [rank2]:E1204 09:19:01.224000 26400 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.8190643Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8191111Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8192074Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8192525Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8193411Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8193758Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8194619Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8195055Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8195905Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8196343Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8197189Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8197593Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8198446Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8198944Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8200380Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 1. CUDA driver allocated memory was 607059968 and is now 632225792. 2025-12-04T09:19:34.8200701Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8201295Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8202472Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8202817Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8203487Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8204005Z [rank1]:E1204 09:19:01.225000 26399 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.8204427Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8204974Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8205928Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8206403Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8207343Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8207710Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8208625Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8209087Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8209991Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8210634Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8211558Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8212004Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8212990Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8213477Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8215042Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 3. CUDA driver allocated memory was 583991296 and is now 632225792. 2025-12-04T09:19:34.8215392Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8216041Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8217382Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8217751Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8218463Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8219016Z [rank3]:E1204 09:19:01.225000 26401 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.8219184Z dist init r=1, world=4 2025-12-04T09:19:34.8219284Z dist init r=3, world=4 2025-12-04T09:19:34.8219393Z dist init r=2, world=4 2025-12-04T09:19:34.8219497Z dist init r=0, world=4 2025-12-04T09:19:34.8219593Z FAILED [8.3923s] [100%] 2025-12-04T09:19:34.8219599Z 2025-12-04T09:19:34.8219755Z =================================== FAILURES =================================== 2025-12-04T09:19:34.8220053Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda _________ 2025-12-04T09:19:34.8220175Z Traceback (most recent call last): 2025-12-04T09:19:34.8220731Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.8221014Z self._join_processes(fn) 2025-12-04T09:19:34.8221612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.8221764Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.8222374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.8222503Z raise RuntimeError(error) 2025-12-04T09:19:34.8222742Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:19:34.8222875Z Traceback (most recent call last): 2025-12-04T09:19:34.8223416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8223530Z getattr(self, test_name)() 2025-12-04T09:19:34.8224071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8224162Z fn() 2025-12-04T09:19:34.8224669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8224792Z method(*args, **kwargs) 2025-12-04T09:19:34.8225297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8225515Z method(*args, **kwargs) 2025-12-04T09:19:34.8226025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8226121Z with policy(): 2025-12-04T09:19:34.8226642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8226753Z raise RuntimeError(msg) 2025-12-04T09:19:34.8227912Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696. 2025-12-04T09:19:34.8227931Z 2025-12-04T09:19:34.8228150Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8228813Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8228819Z 2025-12-04T09:19:34.8229102Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8229108Z 2025-12-04T09:19:34.8229112Z 2025-12-04T09:19:34.8229336Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.8229607Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.8230475Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f8bd87b046fcc0d3.xml - 2025-12-04T09:19:34.8230646Z =========================== short test summary info ============================ 2025-12-04T09:19:34.8231543Z FAILED [8.3923s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:19:34.8231664Z Traceback (most recent call last): 2025-12-04T09:19:34.8232234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8232351Z getattr(self, test_name)() 2025-12-04T09:19:34.8232987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8233082Z fn() 2025-12-04T09:19:34.8239824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8239978Z method(*args, **kwargs) 2025-12-04T09:19:34.8240609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8240726Z method(*args, **kwargs) 2025-12-04T09:19:34.8241208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8241303Z with policy(): 2025-12-04T09:19:34.8241793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8241894Z raise RuntimeError(msg) 2025-12-04T09:19:34.8242989Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696. 2025-12-04T09:19:34.8243001Z 2025-12-04T09:19:34.8243201Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8243823Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8243834Z 2025-12-04T09:19:34.8244089Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8244371Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.8244544Z ======================= 1 failed, 7 deselected in 8.41s ======================== 2025-12-04T09:19:34.8244635Z Got exit code 1 2025-12-04T09:19:34.8244730Z Retrying single test... 2025-12-04T09:19:34.8245373Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-68dc7893385d1617.xml 2025-12-04T09:19:34.8245524Z ============================= test session starts ============================== 2025-12-04T09:19:34.8245852Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.8245961Z cachedir: .pytest_cache 2025-12-04T09:19:34.8246445Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.8246567Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.8246665Z configfile: pytest.ini 2025-12-04T09:19:34.8247166Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.8247367Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.8248064Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8248168Z Running 1 items in this shard 2025-12-04T09:19:34.8248174Z 2025-12-04T09:19:34.8249143Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda I1204 09:19:07.584000 26675 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 26727 2025-12-04T09:19:34.8249691Z I1204 09:19:07.585000 26675 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 26728 2025-12-04T09:19:34.8250162Z I1204 09:19:07.586000 26675 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 26729 2025-12-04T09:19:34.8250724Z I1204 09:19:07.586000 26675 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 26730 2025-12-04T09:19:34.8252255Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8252402Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8253935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8254087Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8255588Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8255736Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8257754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8257923Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8258381Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8258917Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8259919Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8260431Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8261427Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8261822Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8262788Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8263273Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8264299Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8264789Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8265746Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8266193Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8267155Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8267652Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8269334Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 1. CUDA driver allocated memory was 607059968 and is now 632225792. 2025-12-04T09:19:34.8269661Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8270242Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8271225Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8271607Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8272243Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8272729Z [rank1]:E1204 09:19:14.321000 26728 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.8273125Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8273598Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8274490Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8274941Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8275823Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8276170Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8277023Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8277503Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8278367Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8278796Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8279645Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8280044Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8280897Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8281337Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8282775Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 3. CUDA driver allocated memory was 487522304 and is now 632225792. 2025-12-04T09:19:34.8283106Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8283695Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8284727Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8285056Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8285688Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8286175Z [rank3]:E1204 09:19:14.322000 26730 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.8286573Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8287044Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8287939Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8288389Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8289264Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8289613Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8290523Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8290954Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8291801Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8292235Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8293082Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8293487Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8294349Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8294787Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8296278Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 2. CUDA driver allocated memory was 604962816 and is now 632225792. 2025-12-04T09:19:34.8296780Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8297517Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8298625Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8298992Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8299704Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8300257Z [rank2]:E1204 09:19:14.322000 26729 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.8300712Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8301246Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8302251Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8302755Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8303748Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8304198Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8305173Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8305661Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8306615Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8307102Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8308059Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8308620Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8309606Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8310044Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8311467Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696. 2025-12-04T09:19:34.8311850Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8312433Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8313415Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8313743Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8314379Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8314871Z [rank0]:E1204 09:19:14.322000 26727 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.8314965Z dist init r=1, world=4 2025-12-04T09:19:34.8315054Z dist init r=3, world=4 2025-12-04T09:19:34.8315150Z dist init r=0, world=4 2025-12-04T09:19:34.8315235Z dist init r=2, world=4 2025-12-04T09:19:34.8315319Z FAILED [8.3993s] [100%] 2025-12-04T09:19:34.8315324Z 2025-12-04T09:19:34.8315458Z =================================== FAILURES =================================== 2025-12-04T09:19:34.8315721Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda _________ 2025-12-04T09:19:34.8315835Z Traceback (most recent call last): 2025-12-04T09:19:34.8316315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.8316463Z self._join_processes(fn) 2025-12-04T09:19:34.8316984Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.8317113Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.8317653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.8317751Z raise RuntimeError(error) 2025-12-04T09:19:34.8317957Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:19:34.8318067Z Traceback (most recent call last): 2025-12-04T09:19:34.8318542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8318639Z getattr(self, test_name)() 2025-12-04T09:19:34.8319112Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8319192Z fn() 2025-12-04T09:19:34.8319644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8319737Z method(*args, **kwargs) 2025-12-04T09:19:34.8320182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8320277Z method(*args, **kwargs) 2025-12-04T09:19:34.8321026Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8321120Z with policy(): 2025-12-04T09:19:34.8321786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8321891Z raise RuntimeError(msg) 2025-12-04T09:19:34.8323067Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 2. CUDA driver allocated memory was 604962816 and is now 632225792. 2025-12-04T09:19:34.8323079Z 2025-12-04T09:19:34.8323397Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8324062Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8324068Z 2025-12-04T09:19:34.8324330Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8324335Z 2025-12-04T09:19:34.8324340Z 2025-12-04T09:19:34.8324557Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.8324823Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.8325679Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-68dc7893385d1617.xml - 2025-12-04T09:19:34.8325858Z =========================== short test summary info ============================ 2025-12-04T09:19:34.8326688Z FAILED [8.3993s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:19:34.8326806Z Traceback (most recent call last): 2025-12-04T09:19:34.8327363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8327470Z getattr(self, test_name)() 2025-12-04T09:19:34.8328011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8328098Z fn() 2025-12-04T09:19:34.8328601Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8328788Z method(*args, **kwargs) 2025-12-04T09:19:34.8329293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8329402Z method(*args, **kwargs) 2025-12-04T09:19:34.8329904Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8329997Z with policy(): 2025-12-04T09:19:34.8330509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8330614Z raise RuntimeError(msg) 2025-12-04T09:19:34.8331775Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 2. CUDA driver allocated memory was 604962816 and is now 632225792. 2025-12-04T09:19:34.8331792Z 2025-12-04T09:19:34.8332003Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8332667Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8332672Z 2025-12-04T09:19:34.8332942Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8333120Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.8333294Z ======================= 1 failed, 7 deselected in 8.42s ======================== 2025-12-04T09:19:34.8333394Z Got exit code 1 2025-12-04T09:19:34.8333494Z Retrying single test... 2025-12-04T09:19:34.8334261Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14f8a536ecccf07e.xml 2025-12-04T09:19:34.8334513Z ============================= test session starts ============================== 2025-12-04T09:19:34.8334823Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.8334924Z cachedir: .pytest_cache 2025-12-04T09:19:34.8335430Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.8335545Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.8335639Z configfile: pytest.ini 2025-12-04T09:19:34.8336113Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.8336375Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:19:34.8337251Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8337360Z Running 1 items in this shard 2025-12-04T09:19:34.8337370Z 2025-12-04T09:19:34.8338403Z distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda I1204 09:19:20.674000 27004 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 27056 2025-12-04T09:19:34.8338905Z I1204 09:19:20.675000 27004 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 27057 2025-12-04T09:19:34.8339408Z I1204 09:19:20.675000 27004 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 27058 2025-12-04T09:19:34.8339895Z I1204 09:19:20.676000 27004 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 27059 2025-12-04T09:19:34.8341622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8341853Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8343554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8343725Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8345423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8345594Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8347298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:19:34.8347463Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:19:34.8347920Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8348460Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8349500Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8350011Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8350894Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8351245Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8352102Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8352537Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8353394Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8353824Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8354669Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8355071Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8355977Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8356419Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8357853Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 1. CUDA driver allocated memory was 607059968 and is now 632225792. 2025-12-04T09:19:34.8358182Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8358769Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8359748Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8360077Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8360711Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8361203Z [rank1]:E1204 09:19:27.430000 27057 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:19:34.8361606Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8362090Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8363024Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8363476Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8364357Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8364705Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8365563Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8365997Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8366854Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8367287Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8368131Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8368581Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8369436Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8369877Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8371305Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 0. CUDA driver allocated memory was 714014720 and is now 741277696. 2025-12-04T09:19:34.8371635Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8372222Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8373204Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8373528Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8374155Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8374638Z [rank0]:E1204 09:19:27.430000 27056 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:19:34.8375040Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8375555Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8376516Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8377172Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8378173Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8378569Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8379535Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8380019Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8380978Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8381472Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8382496Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8382948Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8383907Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8384401Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8386017Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 2. CUDA driver allocated memory was 604962816 and is now 632225792. 2025-12-04T09:19:34.8386388Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8387045Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8388155Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8388625Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8389393Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8389882Z [rank2]:E1204 09:19:27.430000 27058 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:19:34.8390329Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:19:34.8390796Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:19:34.8391691Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8392138Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:19:34.8393019Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8393373Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:19:34.8394234Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8394663Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8395508Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8396007Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:19:34.8396854Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8397252Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:19:34.8398102Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8398540Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:19:34.8399968Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 3. CUDA driver allocated memory was 498008064 and is now 632225792. 2025-12-04T09:19:34.8400293Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8400877Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8401865Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8402191Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:19:34.8402866Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8403351Z [rank3]:E1204 09:19:27.431000 27059 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:19:34.8403439Z dist init r=1, world=4 2025-12-04T09:19:34.8403526Z dist init r=3, world=4 2025-12-04T09:19:34.8403619Z dist init r=0, world=4 2025-12-04T09:19:34.8403702Z dist init r=2, world=4 2025-12-04T09:19:34.8403787Z FAILED [8.5741s] [100%] 2025-12-04T09:19:34.8403791Z 2025-12-04T09:19:34.8403926Z =================================== FAILURES =================================== 2025-12-04T09:19:34.8404188Z ________ TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda _________ 2025-12-04T09:19:34.8404307Z Traceback (most recent call last): 2025-12-04T09:19:34.8404786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:19:34.8404883Z self._join_processes(fn) 2025-12-04T09:19:34.8405411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:19:34.8405534Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:19:34.8406080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:19:34.8406178Z raise RuntimeError(error) 2025-12-04T09:19:34.8406384Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.8406496Z Traceback (most recent call last): 2025-12-04T09:19:34.8406972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8407117Z getattr(self, test_name)() 2025-12-04T09:19:34.8407595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8407673Z fn() 2025-12-04T09:19:34.8408133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8408224Z method(*args, **kwargs) 2025-12-04T09:19:34.8408670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8408765Z method(*args, **kwargs) 2025-12-04T09:19:34.8409214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8409298Z with policy(): 2025-12-04T09:19:34.8409750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8409846Z raise RuntimeError(msg) 2025-12-04T09:19:34.8410881Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 3. CUDA driver allocated memory was 498008064 and is now 632225792. 2025-12-04T09:19:34.8410888Z 2025-12-04T09:19:34.8411077Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8411660Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8411672Z 2025-12-04T09:19:34.8411912Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8411917Z 2025-12-04T09:19:34.8411921Z 2025-12-04T09:19:34.8412110Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:19:34.8412341Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:19:34.8413155Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14f8a536ecccf07e.xml - 2025-12-04T09:19:34.8413304Z =========================== short test summary info ============================ 2025-12-04T09:19:34.8414034Z FAILED [8.5741s] distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:19:34.8414139Z Traceback (most recent call last): 2025-12-04T09:19:34.8414626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:19:34.8414727Z getattr(self, test_name)() 2025-12-04T09:19:34.8415200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:19:34.8415284Z fn() 2025-12-04T09:19:34.8415728Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8415823Z method(*args, **kwargs) 2025-12-04T09:19:34.8416361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:19:34.8416457Z method(*args, **kwargs) 2025-12-04T09:19:34.8417104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:19:34.8417203Z with policy(): 2025-12-04T09:19:34.8417707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:19:34.8417820Z raise RuntimeError(msg) 2025-12-04T09:19:34.8418984Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda! Caching allocator allocated memory was 512 and is now reported as 4096 on device 3. CUDA driver allocated memory was 498008064 and is now 632225792. 2025-12-04T09:19:34.8419050Z 2025-12-04T09:19:34.8419267Z To execute this test, run the following from the base repo dir: 2025-12-04T09:19:34.8419926Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_exec_order.py TestFSDPExecOrderCUDA.test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8419931Z 2025-12-04T09:19:34.8420197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:19:34.8420378Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:19:34.8420550Z ======================= 1 failed, 7 deselected in 8.60s ======================== 2025-12-04T09:19:34.8420645Z Got exit code 1 2025-12-04T09:19:34.8421430Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda 2025-12-04T09:19:34.8421838Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:19:34.8422525Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-77e61ff77a3b19cd.xml 2025-12-04T09:19:34.8422683Z ============================= test session starts ============================== 2025-12-04T09:19:34.8423027Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:19:34.8423138Z cachedir: .pytest_cache 2025-12-04T09:19:34.8423649Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:19:34.8423767Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:19:34.8423876Z configfile: pytest.ini 2025-12-04T09:19:34.8424408Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:19:34.8424620Z collecting ... collected 8 items / 8 deselected / 0 selected 2025-12-04T09:19:34.8424758Z stepcurrent: skipping 8 already run items. 2025-12-04T09:19:34.8424867Z Running 0 items in this shard 2025-12-04T09:19:34.8424976Z 2025-12-04T09:19:34.8425838Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-77e61ff77a3b19cd.xml - 2025-12-04T09:19:34.8426002Z ============================ 8 deselected in 0.01s ============================= 2025-12-04T09:19:34.8431417Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy0_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_first_iter_order_sharding_strategy1_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_1_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy0_iters_before_path_change_3_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_1_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_invalid_later_iter_order_sharding_strategy1_iters_before_path_change_3_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy0_cuda', 'test/distributed/fsdp/test_fsdp_exec_order.py::TestFSDPExecOrderCUDA::test_train_eval_sharding_strategy1_cuda'] 2025-12-04T09:19:34.8431429Z 2025-12-04T09:19:34.8432100Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_exec_order 1/1 (test/test-reports/distributed.fsdp.test_fsdp_exec_order_1.1_a2a67ccbd845e856_.log) 2025-12-04T09:19:34.8432177Z 2025-12-04T09:19:34.8432577Z Finished distributed/fsdp/test_fsdp_exec_order 1/1 ... [2025-12-04 09:19:34.544881][1606.152791418], took 5.35min 2025-12-04T09:19:34.8433516Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-93c7f0a0a61745d5.xml 2025-12-04T09:19:34.8434347Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-50fd36707db41f77.xml 2025-12-04T09:19:34.8435144Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-434f2a168fab2502.xml 2025-12-04T09:19:34.8435953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-810575b51f00acc3.xml 2025-12-04T09:19:34.8436756Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-acd65444fa26961a.xml 2025-12-04T09:19:34.8437567Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d7f6d912312cc834.xml 2025-12-04T09:19:34.8438369Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d3fa58c4cf34965f.xml 2025-12-04T09:19:34.8439164Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d5b8ecd9108f02ac.xml 2025-12-04T09:19:34.8537798Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-578e4c4077b7a803.xml 2025-12-04T09:19:34.8819300Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14d4a314808f55fe.xml 2025-12-04T09:19:34.9117196Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-72b90a4f7545df10.xml 2025-12-04T09:19:34.9424132Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cc094df1219cfd82.xml 2025-12-04T09:19:34.9827006Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-94627d53ab92538d.xml 2025-12-04T09:19:35.0161066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f49c40cee39994b2.xml 2025-12-04T09:19:35.0479997Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-a8869f6ed51873ac.xml 2025-12-04T09:19:35.0819768Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-90a4ba7c1fd04d10.xml 2025-12-04T09:19:35.1139087Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ccaa5b3b6bf09af7.xml 2025-12-04T09:19:35.1450489Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ca39f8152ef39349.xml 2025-12-04T09:19:35.1739862Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-7178045a44a28781.xml 2025-12-04T09:19:35.2071063Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cdb7b80b8b392fad.xml 2025-12-04T09:19:35.2351611Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-9595731043617943.xml 2025-12-04T09:19:35.2609331Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f8bd87b046fcc0d3.xml 2025-12-04T09:19:35.2922771Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-68dc7893385d1617.xml 2025-12-04T09:19:35.3238800Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14f8a536ecccf07e.xml 2025-12-04T09:19:35.3578851Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-77e61ff77a3b19cd.xml 2025-12-04T09:19:35.6738512Z Uploading logs for 57116084904 to S3 2025-12-04T09:19:35.7157544Z Uploading artifacts took 0.33 seconds 2025-12-04T09:19:35.7158121Z distributed/fsdp/test_fsdp_exec_order 1/1 failed! 2025-12-04T09:19:35.7163553Z Running distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 ... [2025-12-04 09:19:35.715768][1607.323683885] 2025-12-04T09:19:35.7164207Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:19:35.7165505Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_hsdp_dtensor_state_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:19:35.716087] 2025-12-04T09:25:19.6288802Z 2025-12-04T09:25:19.6290028Z PRINTING LOG FILE of distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 (test/test-reports/distributed.fsdp.test_hsdp_dtensor_state_dict_1.1_8591eb8b13b136e6_.log) 2025-12-04T09:25:19.6291619Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a78dec0d79621f36.xml 2025-12-04T09:25:19.6292871Z ============================= test session starts ============================== 2025-12-04T09:25:19.6293531Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.6294111Z cachedir: .pytest_cache 2025-12-04T09:25:19.6294818Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.6295588Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.6295921Z configfile: pytest.ini 2025-12-04T09:25:19.6296936Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.6297750Z collecting ... collected 8 items 2025-12-04T09:25:19.6298171Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T09:25:19.6304793Z Running 8 items in this shard: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda, test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.6311624Z 2025-12-04T09:25:19.6312942Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda I1204 09:19:39.134000 27390 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 27442 2025-12-04T09:25:19.6314874Z I1204 09:19:39.135000 27390 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 27443 2025-12-04T09:25:19.6316034Z I1204 09:19:39.136000 27390 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 27444 2025-12-04T09:25:19.6317172Z I1204 09:19:39.136000 27390 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 27445 2025-12-04T09:25:19.6320252Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.6323168Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.6325900Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.6328565Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.6331157Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.6333801Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.6336492Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.6339142Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.6343893Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.6349069Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.6353966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.6358834Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.6363773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.6368614Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.6373480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.6382897Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.6383876Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.6384961Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.6386602Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6388201Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.6389879Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6391343Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.6392745Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6394239Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6395738Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6397230Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6398833Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.6400285Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.6401754Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.6403254Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.6405610Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:25:19.6407830Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6408903Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.6410989Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6412815Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6413962Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.6415343Z E1204 09:19:46.611000 27442 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.6416483Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.6417728Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.6419358Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6421139Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.6422736Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6424216Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.6425675Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6427216Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6428752Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6430393Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6431949Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.6433619Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.6435042Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.6436497Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.6438784Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.6440931Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6441985Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.6444036Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6445710Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6446807Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.6448010Z E1204 09:19:46.612000 27443 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.6448977Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.6449937Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.6451377Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6452801Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.6454448Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6455839Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.6457512Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6459057Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6460660Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6462201Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6463739Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.6465226Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.6466726Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.6468270Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.6470707Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.6472728Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6473711Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.6475647Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6477317Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6478361Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.6479566Z E1204 09:19:46.613000 27444 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.6480536Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.6481497Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.6482955Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6484377Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.6485787Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6487102Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.6488399Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6489770Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6491176Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6492547Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6493925Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.6495264Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.6496865Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.6498416Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.6500839Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 302972928 and is now 613351424. 2025-12-04T09:25:19.6503117Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6504276Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.6506427Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6508300Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6509626Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.6510830Z E1204 09:19:46.613000 27445 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.6511681Z FAILED [9.3991s] [ 12%] 2025-12-04T09:25:19.6511854Z 2025-12-04T09:25:19.6511998Z =================================== FAILURES =================================== 2025-12-04T09:25:19.6512740Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda _ 2025-12-04T09:25:19.6513457Z Traceback (most recent call last): 2025-12-04T09:25:19.6514207Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.6514953Z self._join_processes(fn) 2025-12-04T09:25:19.6515711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.6516532Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.6517372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.6518178Z raise RuntimeError(error) 2025-12-04T09:25:19.6518604Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:25:19.6519074Z Traceback (most recent call last): 2025-12-04T09:25:19.6519808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6520640Z getattr(self, test_name)() 2025-12-04T09:25:19.6521696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6522477Z fn() 2025-12-04T09:25:19.6523116Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6523876Z method(*args, **kwargs) 2025-12-04T09:25:19.6524594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6525343Z method(*args, **kwargs) 2025-12-04T09:25:19.6526061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.6526816Z with policy(): 2025-12-04T09:25:19.6527505Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.6528265Z raise RuntimeError(msg) 2025-12-04T09:25:19.6529870Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:25:19.6531402Z 2025-12-04T09:25:19.6531620Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.6532920Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6534153Z 2025-12-04T09:25:19.6534431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.6534822Z 2025-12-04T09:25:19.6534826Z 2025-12-04T09:25:19.6535051Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.6535764Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.6537329Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a78dec0d79621f36.xml - 2025-12-04T09:25:19.6538581Z =========================== short test summary info ============================ 2025-12-04T09:25:19.6539988Z FAILED [9.3991s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:25:19.6541347Z Traceback (most recent call last): 2025-12-04T09:25:19.6542146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6542962Z getattr(self, test_name)() 2025-12-04T09:25:19.6543709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6544484Z fn() 2025-12-04T09:25:19.6545137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6545887Z method(*args, **kwargs) 2025-12-04T09:25:19.6546606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6547368Z method(*args, **kwargs) 2025-12-04T09:25:19.6548082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.6549030Z with policy(): 2025-12-04T09:25:19.6549677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.6550488Z raise RuntimeError(msg) 2025-12-04T09:25:19.6551976Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:25:19.6553402Z 2025-12-04T09:25:19.6553606Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.6554826Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6555845Z 2025-12-04T09:25:19.6556095Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.6556653Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.6557099Z ============================== 1 failed in 9.42s =============================== 2025-12-04T09:25:19.6557480Z Got exit code 1 2025-12-04T09:25:19.6557736Z Retrying single test... 2025-12-04T09:25:19.6558620Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9a14ac4718e66e44.xml 2025-12-04T09:25:19.6559620Z ============================= test session starts ============================== 2025-12-04T09:25:19.6560240Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.6560811Z cachedir: .pytest_cache 2025-12-04T09:25:19.6561469Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.6562235Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.6562572Z configfile: pytest.ini 2025-12-04T09:25:19.6563250Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.6564126Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.6565411Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6566598Z Running 1 items in this shard 2025-12-04T09:25:19.6566796Z 2025-12-04T09:25:19.6568020Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda I1204 09:19:53.214000 27783 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 27835 2025-12-04T09:25:19.6569864Z I1204 09:19:53.215000 27783 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 27836 2025-12-04T09:25:19.6570878Z I1204 09:19:53.216000 27783 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 27837 2025-12-04T09:25:19.6571888Z I1204 09:19:53.216000 27783 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 27838 2025-12-04T09:25:19.6574607Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.6577258Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.6579928Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.6582567Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.6585184Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.6587814Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.6590333Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.6592677Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.6596882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.6601385Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.6605858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.6610292Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.6614837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.6619847Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.6625239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.6630315Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.6631357Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.6632452Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.6634105Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6635515Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.6636946Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6638270Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.6639568Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6640936Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6642299Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6643668Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6645108Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.6646442Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.6647767Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.6649137Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.6651294Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 170852352 and is now 617545728. 2025-12-04T09:25:19.6653327Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6654324Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.6656282Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6658297Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6659473Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.6660868Z E1204 09:20:00.774000 27838 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.6661952Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.6663016Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.6664642Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6666246Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.6667841Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6669351Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.6670636Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6672007Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6673376Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6674789Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6676150Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.6677477Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.6678816Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.6680192Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.6682342Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:25:19.6684352Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6685349Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.6687252Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6688917Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6689962Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.6691178Z E1204 09:20:00.774000 27837 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.6692144Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.6693097Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.6694544Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6695954Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.6697714Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6699197Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.6700653Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6702198Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6703807Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6705356Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6706899Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.6708405Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.6709893Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.6711256Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.6713411Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 649003008 and is now 722403328. 2025-12-04T09:25:19.6715452Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6716442Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.6718378Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6719992Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6721372Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.6722734Z E1204 09:20:00.774000 27835 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.6723824Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.6724895Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.6726535Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6728134Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.6729742Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6731223Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.6732667Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6734335Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6735800Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6737478Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6739022Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.6740504Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.6742019Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.6743586Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.6746016Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:25:19.6748309Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6749488Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.6751651Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6753334Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6754378Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.6755591Z E1204 09:20:00.774000 27836 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.6756263Z FAILED [9.5266s] [100%] 2025-12-04T09:25:19.6756441Z 2025-12-04T09:25:19.6756582Z =================================== FAILURES =================================== 2025-12-04T09:25:19.6757289Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda _ 2025-12-04T09:25:19.6757956Z Traceback (most recent call last): 2025-12-04T09:25:19.6758670Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.6759390Z self._join_processes(fn) 2025-12-04T09:25:19.6760113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.6760884Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.6761683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.6762467Z raise RuntimeError(error) 2025-12-04T09:25:19.6762863Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:25:19.6763307Z Traceback (most recent call last): 2025-12-04T09:25:19.6764010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6764780Z getattr(self, test_name)() 2025-12-04T09:25:19.6765451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6766140Z fn() 2025-12-04T09:25:19.6766720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6767383Z method(*args, **kwargs) 2025-12-04T09:25:19.6768028Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6768705Z method(*args, **kwargs) 2025-12-04T09:25:19.6769340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.6770004Z with policy(): 2025-12-04T09:25:19.6770614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.6771304Z raise RuntimeError(msg) 2025-12-04T09:25:19.6772726Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:25:19.6774063Z 2025-12-04T09:25:19.6774261Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.6775420Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6776481Z 2025-12-04T09:25:19.6776906Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.6777310Z 2025-12-04T09:25:19.6777315Z 2025-12-04T09:25:19.6777556Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.6778234Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.6779570Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9a14ac4718e66e44.xml - 2025-12-04T09:25:19.6780820Z =========================== short test summary info ============================ 2025-12-04T09:25:19.6782224Z FAILED [9.5266s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:25:19.6783571Z Traceback (most recent call last): 2025-12-04T09:25:19.6784357Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6785161Z getattr(self, test_name)() 2025-12-04T09:25:19.6785923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6786689Z fn() 2025-12-04T09:25:19.6787338Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6788104Z method(*args, **kwargs) 2025-12-04T09:25:19.6788926Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6789593Z method(*args, **kwargs) 2025-12-04T09:25:19.6790231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.6790902Z with policy(): 2025-12-04T09:25:19.6791501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.6792191Z raise RuntimeError(msg) 2025-12-04T09:25:19.6793658Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:25:19.6794999Z 2025-12-04T09:25:19.6795205Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.6796356Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6797304Z 2025-12-04T09:25:19.6797540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.6798068Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.6798524Z ======================= 1 failed, 7 deselected in 9.55s ======================== 2025-12-04T09:25:19.6798909Z Got exit code 1 2025-12-04T09:25:19.6799141Z Retrying single test... 2025-12-04T09:25:19.6799985Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7d115d367e840460.xml 2025-12-04T09:25:19.6800930Z ============================= test session starts ============================== 2025-12-04T09:25:19.6801510Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.6802045Z cachedir: .pytest_cache 2025-12-04T09:25:19.6802679Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.6803409Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.6803715Z configfile: pytest.ini 2025-12-04T09:25:19.6804366Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.6805186Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.6806387Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6807507Z Running 1 items in this shard 2025-12-04T09:25:19.6807705Z 2025-12-04T09:25:19.6808850Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda I1204 09:20:07.394000 28176 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 28228 2025-12-04T09:25:19.6810557Z I1204 09:20:07.395000 28176 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 28229 2025-12-04T09:25:19.6811574Z I1204 09:20:07.396000 28176 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 28230 2025-12-04T09:25:19.6812567Z I1204 09:20:07.396000 28176 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 28231 2025-12-04T09:25:19.6815283Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.6818029Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.6821422Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.6824104Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.6826713Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.6829335Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.6831940Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.6834788Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.6839220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.6843998Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.6848766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.6853476Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.6858632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.6863649Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.6868677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.6873277Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.6874172Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.6875125Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.6876570Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6878184Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.6879691Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6881071Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.6882444Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6883899Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6885355Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6886803Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6888289Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.6889798Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.6891145Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.6892518Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.6894675Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 617545728. 2025-12-04T09:25:19.6916387Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6917456Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.6919482Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6921673Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6922867Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.6924304Z E1204 09:20:14.993000 28229 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.6925370Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.6926425Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.6928048Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6929638Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.6931250Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6932728Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.6934325Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6935695Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6937365Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6939016Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6940566Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.6942074Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.6943577Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.6945133Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.6947569Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 649003008 and is now 722403328. 2025-12-04T09:25:19.6950095Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6951187Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.6953269Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6955086Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6956238Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.6957609Z E1204 09:20:14.993000 28228 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.6958668Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.6959700Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.6961285Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6962845Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.6964504Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6965970Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.6967256Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6968623Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6970001Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.6971417Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.6972961Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.6974367Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.6975791Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.6977542Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.6979979Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.6982271Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6983381Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.6985535Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.6987425Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.6988734Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.6990050Z E1204 09:20:14.994000 28230 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.6991022Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.6991982Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.6993431Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.6994855Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.6996264Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.6997580Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.6998876Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7000243Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7001696Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7003067Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7004438Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7005763Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7007107Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7008475Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7010632Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 481230848 and is now 613351424. 2025-12-04T09:25:19.7012657Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7013654Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7015585Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7017600Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7018780Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7020135Z E1204 09:20:14.996000 28231 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.7021081Z FAILED [9.5127s] [100%] 2025-12-04T09:25:19.7021263Z 2025-12-04T09:25:19.7021417Z =================================== FAILURES =================================== 2025-12-04T09:25:19.7022219Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda _ 2025-12-04T09:25:19.7022987Z Traceback (most recent call last): 2025-12-04T09:25:19.7023792Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.7024588Z self._join_processes(fn) 2025-12-04T09:25:19.7025399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.7026280Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.7027170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.7028026Z raise RuntimeError(error) 2025-12-04T09:25:19.7028485Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.7028992Z Traceback (most recent call last): 2025-12-04T09:25:19.7029772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7030582Z getattr(self, test_name)() 2025-12-04T09:25:19.7031453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7032244Z fn() 2025-12-04T09:25:19.7032990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7033712Z method(*args, **kwargs) 2025-12-04T09:25:19.7034388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7035096Z method(*args, **kwargs) 2025-12-04T09:25:19.7035783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7036502Z with policy(): 2025-12-04T09:25:19.7037153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7037871Z raise RuntimeError(msg) 2025-12-04T09:25:19.7039377Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 617545728. 2025-12-04T09:25:19.7040817Z 2025-12-04T09:25:19.7041026Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7042250Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7043293Z 2025-12-04T09:25:19.7043550Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7043943Z 2025-12-04T09:25:19.7043949Z 2025-12-04T09:25:19.7044166Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.7044768Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.7046081Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7d115d367e840460.xml - 2025-12-04T09:25:19.7047249Z =========================== short test summary info ============================ 2025-12-04T09:25:19.7048582Z FAILED [9.5127s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.7049848Z Traceback (most recent call last): 2025-12-04T09:25:19.7050599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7051344Z getattr(self, test_name)() 2025-12-04T09:25:19.7052069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7052807Z fn() 2025-12-04T09:25:19.7053410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7054130Z method(*args, **kwargs) 2025-12-04T09:25:19.7054895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7055573Z method(*args, **kwargs) 2025-12-04T09:25:19.7056279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7057173Z with policy(): 2025-12-04T09:25:19.7057909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7058686Z raise RuntimeError(msg) 2025-12-04T09:25:19.7060345Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 617545728. 2025-12-04T09:25:19.7061890Z 2025-12-04T09:25:19.7062109Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7063412Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7064485Z 2025-12-04T09:25:19.7064771Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7065350Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.7065858Z ======================= 1 failed, 7 deselected in 9.53s ======================== 2025-12-04T09:25:19.7066289Z Got exit code 1 2025-12-04T09:25:19.7067321Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7068796Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:25:19.7069963Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-724e16d7d24ec18b.xml 2025-12-04T09:25:19.7070908Z ============================= test session starts ============================== 2025-12-04T09:25:19.7071550Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.7072074Z cachedir: .pytest_cache 2025-12-04T09:25:19.7072709Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.7073411Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.7073747Z configfile: pytest.ini 2025-12-04T09:25:19.7074394Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.7075179Z collecting ... collected 8 items / 1 deselected / 7 selected 2025-12-04T09:25:19.7075612Z stepcurrent: skipping 1 already run items. 2025-12-04T09:25:19.7075946Z Running 7 items in this shard 2025-12-04T09:25:19.7076144Z 2025-12-04T09:25:19.7077292Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda I1204 09:20:21.564000 28569 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 28621 2025-12-04T09:25:19.7078987Z I1204 09:20:21.564000 28569 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 28622 2025-12-04T09:25:19.7080006Z I1204 09:20:21.565000 28569 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 28623 2025-12-04T09:25:19.7081002Z I1204 09:20:21.566000 28569 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 28624 2025-12-04T09:25:19.7083729Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7086071Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7088442Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7090797Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7093101Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7095441Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7098140Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7100777Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7105513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.7110504Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.7114980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.7119438Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.7124689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.7129712Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.7134730Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.7139776Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.7140796Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7141867Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7143498Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7145097Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7146696Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7148159Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7149625Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7150995Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7152370Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7153739Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7155141Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7156474Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7157815Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7159195Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7161344Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 617545728. 2025-12-04T09:25:19.7163355Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7164358Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7166256Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7167914Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7168967Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7170184Z E1204 09:20:29.005000 28622 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.7171154Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7172108Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7173554Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7174964Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7176463Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7178106Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7179547Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7181080Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7182613Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7184212Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7185752Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7187225Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7188717Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7190192Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7192324Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 617545728. 2025-12-04T09:25:19.7194336Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7195312Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7197196Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7198845Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7199911Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7201103Z E1204 09:20:29.005000 28624 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.7202056Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7203001Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7204442Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7205853Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7207253Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7208551Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7209827Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7211191Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7212601Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7213952Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7215309Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7216868Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7218368Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7219901Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7222548Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:25:19.7224834Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7225946Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7228146Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7230005Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7231161Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7232605Z E1204 09:20:29.515000 28623 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.7233694Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7234636Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7236068Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7237473Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7239070Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7240445Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7241800Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7243233Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7244744Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7246179Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7247796Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7249239Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7250682Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7252187Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7254531Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 649003008 and is now 722403328. 2025-12-04T09:25:19.7256994Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7258101Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7260273Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7262128Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7263291Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7264630Z E1204 09:20:29.516000 28621 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.7265382Z FAILED [9.0335s] [ 14%] 2025-12-04T09:25:19.7265569Z 2025-12-04T09:25:19.7265716Z =================================== FAILURES =================================== 2025-12-04T09:25:19.7266487Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda _ 2025-12-04T09:25:19.7267226Z Traceback (most recent call last): 2025-12-04T09:25:19.7268012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.7269028Z self._join_processes(fn) 2025-12-04T09:25:19.7269901Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.7270712Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.7271530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.7272336Z raise RuntimeError(error) 2025-12-04T09:25:19.7272757Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.7273208Z Traceback (most recent call last): 2025-12-04T09:25:19.7273939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7274734Z getattr(self, test_name)() 2025-12-04T09:25:19.7275445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7276155Z fn() 2025-12-04T09:25:19.7276756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7277461Z method(*args, **kwargs) 2025-12-04T09:25:19.7278197Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7278858Z method(*args, **kwargs) 2025-12-04T09:25:19.7279485Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7280135Z with policy(): 2025-12-04T09:25:19.7280721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7281390Z raise RuntimeError(msg) 2025-12-04T09:25:19.7282792Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 617545728. 2025-12-04T09:25:19.7284122Z 2025-12-04T09:25:19.7284323Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7285456Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7286430Z 2025-12-04T09:25:19.7286669Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7287026Z 2025-12-04T09:25:19.7287030Z 2025-12-04T09:25:19.7287229Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.7287804Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.7288976Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-724e16d7d24ec18b.xml - 2025-12-04T09:25:19.7290078Z =========================== short test summary info ============================ 2025-12-04T09:25:19.7291313Z FAILED [9.0335s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.7292478Z Traceback (most recent call last): 2025-12-04T09:25:19.7293163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7293865Z getattr(self, test_name)() 2025-12-04T09:25:19.7294526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7295208Z fn() 2025-12-04T09:25:19.7295770Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7296505Z method(*args, **kwargs) 2025-12-04T09:25:19.7297374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7298122Z method(*args, **kwargs) 2025-12-04T09:25:19.7298821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7299556Z with policy(): 2025-12-04T09:25:19.7300234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7301066Z raise RuntimeError(msg) 2025-12-04T09:25:19.7302646Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 617545728. 2025-12-04T09:25:19.7304154Z 2025-12-04T09:25:19.7304369Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7305656Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7306720Z 2025-12-04T09:25:19.7306998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7307571Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.7308097Z ======================= 1 failed, 1 deselected in 9.06s ======================== 2025-12-04T09:25:19.7308528Z Got exit code 1 2025-12-04T09:25:19.7308897Z Retrying single test... 2025-12-04T09:25:19.7309750Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-1c81c8f34feb9c16.xml 2025-12-04T09:25:19.7311394Z ============================= test session starts ============================== 2025-12-04T09:25:19.7311986Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.7312509Z cachedir: .pytest_cache 2025-12-04T09:25:19.7313140Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.7313864Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.7314178Z configfile: pytest.ini 2025-12-04T09:25:19.7314818Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.7315636Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.7316840Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7317949Z Running 1 items in this shard 2025-12-04T09:25:19.7318149Z 2025-12-04T09:25:19.7319288Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda I1204 09:20:35.634000 28962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 29014 2025-12-04T09:25:19.7321128Z I1204 09:20:35.635000 28962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 29015 2025-12-04T09:25:19.7322447Z I1204 09:20:35.636000 28962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 29016 2025-12-04T09:25:19.7323585Z I1204 09:20:35.636000 28962 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 29017 2025-12-04T09:25:19.7326652Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7329315Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7332029Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7334645Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7337509Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7340148Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7342779Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7345410Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7350212Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.7354744Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.7359231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.7363699Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.7368269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.7372693Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.7377493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.7382596Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.7383596Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7384671Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7386311Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7387913Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7389542Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7390858Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7392139Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7393517Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7394891Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7396258Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7397676Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7399000Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7400341Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7401710Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7403855Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.7405872Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7406854Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7408747Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7410408Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7411455Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7412705Z E1204 09:20:43.620000 29016 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.7413663Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7414620Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7416069Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7417856Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7419448Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7421118Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7422579Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7424129Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7425684Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7427306Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7428851Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7430342Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7431847Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7433428Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7435576Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:25:19.7437609Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7438602Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7440495Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7442153Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7443182Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7444407Z E1204 09:20:43.620000 29014 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.7445366Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7446322Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7447745Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7449154Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7450574Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7451878Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7453163Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7454505Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7455874Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7457620Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7459160Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7460634Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7462126Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7463661Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7466081Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.7468356Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7469460Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7471351Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7473000Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7474060Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7475248Z E1204 09:20:43.621000 29017 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.7476192Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7477140Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7478584Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7480001Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7481413Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7482700Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7483984Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7485339Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7486808Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7488153Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7489516Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7490835Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7492161Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7493525Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7495646Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 613351424. 2025-12-04T09:25:19.7498022Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7499120Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7501288Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7503154Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7504303Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7505650Z E1204 09:20:43.622000 29015 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.7506402Z FAILED [9.9545s] [100%] 2025-12-04T09:25:19.7506578Z 2025-12-04T09:25:19.7506743Z =================================== FAILURES =================================== 2025-12-04T09:25:19.7507507Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda _ 2025-12-04T09:25:19.7508252Z Traceback (most recent call last): 2025-12-04T09:25:19.7509139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.7509905Z self._join_processes(fn) 2025-12-04T09:25:19.7510666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.7511500Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.7512353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.7513181Z raise RuntimeError(error) 2025-12-04T09:25:19.7513606Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.7514078Z Traceback (most recent call last): 2025-12-04T09:25:19.7514832Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7515590Z getattr(self, test_name)() 2025-12-04T09:25:19.7516367Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7517112Z fn() 2025-12-04T09:25:19.7517736Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7518678Z method(*args, **kwargs) 2025-12-04T09:25:19.7519311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7520171Z method(*args, **kwargs) 2025-12-04T09:25:19.7520964Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7521876Z with policy(): 2025-12-04T09:25:19.7522559Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7523321Z raise RuntimeError(msg) 2025-12-04T09:25:19.7524897Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 613351424. 2025-12-04T09:25:19.7526398Z 2025-12-04T09:25:19.7526614Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7527882Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7529020Z 2025-12-04T09:25:19.7529298Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7529697Z 2025-12-04T09:25:19.7529702Z 2025-12-04T09:25:19.7529937Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.7530557Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.7531934Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-1c81c8f34feb9c16.xml - 2025-12-04T09:25:19.7533282Z =========================== short test summary info ============================ 2025-12-04T09:25:19.7534647Z FAILED [9.9545s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.7535921Z Traceback (most recent call last): 2025-12-04T09:25:19.7536927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7537729Z getattr(self, test_name)() 2025-12-04T09:25:19.7538479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7539244Z fn() 2025-12-04T09:25:19.7539885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7540644Z method(*args, **kwargs) 2025-12-04T09:25:19.7541342Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7542088Z method(*args, **kwargs) 2025-12-04T09:25:19.7542788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7543527Z with policy(): 2025-12-04T09:25:19.7544203Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7544959Z raise RuntimeError(msg) 2025-12-04T09:25:19.7546627Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 613351424. 2025-12-04T09:25:19.7548245Z 2025-12-04T09:25:19.7548462Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7549725Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7550677Z 2025-12-04T09:25:19.7550916Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7551438Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.7551881Z ======================= 1 failed, 7 deselected in 9.98s ======================== 2025-12-04T09:25:19.7552248Z Got exit code 1 2025-12-04T09:25:19.7552486Z Retrying single test... 2025-12-04T09:25:19.7553327Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a326f09bb7c5e616.xml 2025-12-04T09:25:19.7554258Z ============================= test session starts ============================== 2025-12-04T09:25:19.7554842Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.7555369Z cachedir: .pytest_cache 2025-12-04T09:25:19.7555991Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.7556699Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.7557005Z configfile: pytest.ini 2025-12-04T09:25:19.7557644Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.7558414Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.7559641Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7560760Z Running 1 items in this shard 2025-12-04T09:25:19.7560942Z 2025-12-04T09:25:19.7562089Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda I1204 09:20:50.194000 29355 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 29407 2025-12-04T09:25:19.7563794Z I1204 09:20:50.195000 29355 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 29408 2025-12-04T09:25:19.7564790Z I1204 09:20:50.195000 29355 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 29409 2025-12-04T09:25:19.7565795Z I1204 09:20:50.196000 29355 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 29410 2025-12-04T09:25:19.7568502Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7570848Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7573225Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7575545Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7578237Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7580878Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7583471Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:243: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7586101Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7590788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.7595281Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.7599951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.7604651Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.7609421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.7614033Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.7618997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:25:19.7624191Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.7625678Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7626750Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7628375Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7629961Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7631544Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7633202Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7634488Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7635857Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7637214Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7638574Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7640020Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7641350Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7642684Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7644044Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7646192Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 734986240. 2025-12-04T09:25:19.7648207Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7649181Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7651075Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7652724Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7653759Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7654966Z E1204 09:20:57.597000 29407 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.7655958Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7657287Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7658891Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7660480Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7662066Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7663539Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7664981Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7666509Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7668052Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7669593Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7671001Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7672313Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7673646Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7675015Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7677161Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.7679170Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7680146Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7682037Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7683682Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7684720Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7685939Z E1204 09:20:57.603000 29410 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.7686882Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7687829Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7689262Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7690674Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7692073Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7693376Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7694653Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7696008Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7697705Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7699283Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7700826Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7702321Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7703823Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7705360Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7707758Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.7710001Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7710974Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7712858Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7714508Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7715548Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7716734Z E1204 09:20:57.604000 29408 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.7717685Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7718626Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7720050Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7721824Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7723400Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7724855Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7726294Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7727813Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7729433Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7730952Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7732475Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7734145Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7735749Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7737499Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7739918Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 518979584 and is now 613351424. 2025-12-04T09:25:19.7742209Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7743324Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7745496Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7747364Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7748666Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7750071Z E1204 09:20:57.605000 29409 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.7750769Z FAILED [9.3034s] [100%] 2025-12-04T09:25:19.7750940Z 2025-12-04T09:25:19.7751079Z =================================== FAILURES =================================== 2025-12-04T09:25:19.7751809Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda _ 2025-12-04T09:25:19.7752498Z Traceback (most recent call last): 2025-12-04T09:25:19.7753225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.7753964Z self._join_processes(fn) 2025-12-04T09:25:19.7754704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.7755504Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.7756313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.7757112Z raise RuntimeError(error) 2025-12-04T09:25:19.7757529Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:25:19.7757974Z Traceback (most recent call last): 2025-12-04T09:25:19.7758692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7759428Z getattr(self, test_name)() 2025-12-04T09:25:19.7760194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7760910Z fn() 2025-12-04T09:25:19.7761508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7762207Z method(*args, **kwargs) 2025-12-04T09:25:19.7762861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7763566Z method(*args, **kwargs) 2025-12-04T09:25:19.7764229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7764929Z with policy(): 2025-12-04T09:25:19.7765547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7766258Z raise RuntimeError(msg) 2025-12-04T09:25:19.7767746Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.7769212Z 2025-12-04T09:25:19.7769411Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7770532Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7771475Z 2025-12-04T09:25:19.7771738Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7772094Z 2025-12-04T09:25:19.7772098Z 2025-12-04T09:25:19.7772294Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.7772842Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.7774045Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a326f09bb7c5e616.xml - 2025-12-04T09:25:19.7775133Z =========================== short test summary info ============================ 2025-12-04T09:25:19.7776452Z FAILED [9.3034s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:25:19.7777930Z Traceback (most recent call last): 2025-12-04T09:25:19.7778709Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7779503Z getattr(self, test_name)() 2025-12-04T09:25:19.7780254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7781015Z fn() 2025-12-04T09:25:19.7781642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7782402Z method(*args, **kwargs) 2025-12-04T09:25:19.7783105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7783846Z method(*args, **kwargs) 2025-12-04T09:25:19.7784536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7785274Z with policy(): 2025-12-04T09:25:19.7785950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7786700Z raise RuntimeError(msg) 2025-12-04T09:25:19.7788329Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.7789828Z 2025-12-04T09:25:19.7790019Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7791154Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7792095Z 2025-12-04T09:25:19.7792331Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7792840Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.7793278Z ======================= 1 failed, 7 deselected in 9.32s ======================== 2025-12-04T09:25:19.7793639Z Got exit code 1 2025-12-04T09:25:19.7794526Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.7795759Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:25:19.7796910Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7096ae518bc839e.xml 2025-12-04T09:25:19.7797837Z ============================= test session starts ============================== 2025-12-04T09:25:19.7798403Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.7798953Z cachedir: .pytest_cache 2025-12-04T09:25:19.7799576Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.7800261Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.7800588Z configfile: pytest.ini 2025-12-04T09:25:19.7801217Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.7801399Z collecting ... collected 8 items / 2 deselected / 6 selected 2025-12-04T09:25:19.7801526Z stepcurrent: skipping 2 already run items. 2025-12-04T09:25:19.7801625Z Running 6 items in this shard 2025-12-04T09:25:19.7801631Z 2025-12-04T09:25:19.7802790Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda I1204 09:21:04.174000 29748 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 29800 2025-12-04T09:25:19.7803233Z I1204 09:21:04.174000 29748 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 29801 2025-12-04T09:25:19.7803678Z I1204 09:21:04.175000 29748 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 29802 2025-12-04T09:25:19.7804125Z I1204 09:21:04.176000 29748 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 29803 2025-12-04T09:25:19.7806260Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7806367Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7808542Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7808648Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7810768Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7810874Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7812984Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7813094Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7814629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.7814784Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.7816405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.7816516Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.7818393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.7818517Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.7820233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.7820357Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.7820964Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7821475Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7822457Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7823050Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7824019Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7824405Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7825343Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7825815Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7826758Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7827219Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7828164Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7828584Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7829574Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7830044Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7831924Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 636420096 and is now 722403328. 2025-12-04T09:25:19.7832263Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7832982Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7834279Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7834604Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7835263Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7835752Z E1204 09:21:11.660000 29800 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.7836162Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7836637Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7837610Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7838078Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7838980Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7839333Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7840209Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7840647Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7841532Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7841961Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7842849Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7843273Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7844334Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7844826Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7846581Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.7846920Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7847533Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7848872Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7849197Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7849875Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7850377Z E1204 09:21:11.660000 29802 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.7850788Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7851284Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7852274Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7852755Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7853686Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7854054Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7855044Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7855477Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7856438Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7857058Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7858009Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7858466Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7859416Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7859909Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7861714Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:25:19.7862069Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7862708Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7864085Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7864425Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7865120Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7865642Z E1204 09:21:11.661000 29801 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.7866067Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7866638Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7867616Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7868106Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7869242Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7869582Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7870417Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7870827Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7871666Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7872074Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7872938Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7873310Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7874219Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7874639Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7876248Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 502202368 and is now 613351424. 2025-12-04T09:25:19.7876563Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7877130Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7878356Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7878656Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7879280Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7879738Z E1204 09:21:11.662000 29803 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.7879831Z FAILED [9.3894s] [ 16%] 2025-12-04T09:25:19.7879885Z 2025-12-04T09:25:19.7880031Z =================================== FAILURES =================================== 2025-12-04T09:25:19.7880467Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda _ 2025-12-04T09:25:19.7880588Z Traceback (most recent call last): 2025-12-04T09:25:19.7881076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.7881177Z self._join_processes(fn) 2025-12-04T09:25:19.7881707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.7881836Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.7882378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.7882490Z raise RuntimeError(error) 2025-12-04T09:25:19.7882704Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:25:19.7882818Z Traceback (most recent call last): 2025-12-04T09:25:19.7883302Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7883402Z getattr(self, test_name)() 2025-12-04T09:25:19.7883891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7883973Z fn() 2025-12-04T09:25:19.7884428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7884568Z method(*args, **kwargs) 2025-12-04T09:25:19.7885020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7885120Z method(*args, **kwargs) 2025-12-04T09:25:19.7885570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7885684Z with policy(): 2025-12-04T09:25:19.7886145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7886241Z raise RuntimeError(msg) 2025-12-04T09:25:19.7887482Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 502202368 and is now 613351424. 2025-12-04T09:25:19.7887491Z 2025-12-04T09:25:19.7887686Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7888526Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7888533Z 2025-12-04T09:25:19.7888777Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7888782Z 2025-12-04T09:25:19.7888786Z 2025-12-04T09:25:19.7888983Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.7889225Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.7890055Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7096ae518bc839e.xml - 2025-12-04T09:25:19.7890221Z =========================== short test summary info ============================ 2025-12-04T09:25:19.7891236Z FAILED [9.3894s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:25:19.7891345Z Traceback (most recent call last): 2025-12-04T09:25:19.7891843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7891944Z getattr(self, test_name)() 2025-12-04T09:25:19.7892422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7892512Z fn() 2025-12-04T09:25:19.7892967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7893077Z method(*args, **kwargs) 2025-12-04T09:25:19.7893525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7893617Z method(*args, **kwargs) 2025-12-04T09:25:19.7894079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7894167Z with policy(): 2025-12-04T09:25:19.7894633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7894727Z raise RuntimeError(msg) 2025-12-04T09:25:19.7895955Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 502202368 and is now 613351424. 2025-12-04T09:25:19.7895986Z 2025-12-04T09:25:19.7896260Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7897330Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7897372Z 2025-12-04T09:25:19.7897653Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7897833Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.7898012Z ======================= 1 failed, 2 deselected in 9.41s ======================== 2025-12-04T09:25:19.7898121Z Got exit code 1 2025-12-04T09:25:19.7898229Z Retrying single test... 2025-12-04T09:25:19.7899001Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-dbe06a751e4355d9.xml 2025-12-04T09:25:19.7899163Z ============================= test session starts ============================== 2025-12-04T09:25:19.7899516Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.7899631Z cachedir: .pytest_cache 2025-12-04T09:25:19.7900153Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.7900279Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.7900394Z configfile: pytest.ini 2025-12-04T09:25:19.7900935Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.7901148Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.7902171Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7902286Z Running 1 items in this shard 2025-12-04T09:25:19.7902292Z 2025-12-04T09:25:19.7903651Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda I1204 09:21:18.194000 30141 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 30193 2025-12-04T09:25:19.7904160Z I1204 09:21:18.195000 30141 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 30194 2025-12-04T09:25:19.7904661Z I1204 09:21:18.195000 30141 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 30195 2025-12-04T09:25:19.7905152Z I1204 09:21:18.196000 30141 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 30196 2025-12-04T09:25:19.7907592Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7907714Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7910075Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7910207Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7912344Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7912488Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7914607Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.7914725Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.7916273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.7916410Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.7917947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.7918080Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.7919656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.7919787Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.7921636Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.7921770Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.7922225Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7922747Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7923751Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7924243Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7925235Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7925670Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7926615Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7927135Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7928068Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7928546Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7929482Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7929913Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7930871Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7931340Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7933180Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:25:19.7933631Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7934287Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7935508Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7935824Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7936496Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7937181Z E1204 09:21:25.742000 30193 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.7937626Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7938139Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7939132Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7939619Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7940584Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7941004Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7941946Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7942450Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7943386Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7943859Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7944792Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7945219Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7946176Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7946644Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7948571Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.7949065Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7949644Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7950858Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7951160Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7951791Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7952258Z E1204 09:21:25.742000 30194 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.7952652Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7953106Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7953990Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7954595Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7955533Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7955902Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7956810Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7957255Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7958140Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7958587Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7959470Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7959870Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7960767Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7961208Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7962983Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 518979584 and is now 613351424. 2025-12-04T09:25:19.7963308Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7963914Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7965207Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7965527Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7966299Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7966764Z E1204 09:21:25.743000 30195 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.7967157Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.7967605Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.7968476Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7968945Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.7969808Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7970176Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.7971007Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7971429Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7972258Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7972673Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.7973518Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7973892Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.7974744Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7975161Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.7977129Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 506396672 and is now 613351424. 2025-12-04T09:25:19.7977479Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7978115Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7979505Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7979849Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.7980559Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7981082Z E1204 09:21:25.745000 30196 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.7981203Z FAILED [9.3701s] [100%] 2025-12-04T09:25:19.7981209Z 2025-12-04T09:25:19.7981359Z =================================== FAILURES =================================== 2025-12-04T09:25:19.7981853Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda _ 2025-12-04T09:25:19.7981988Z Traceback (most recent call last): 2025-12-04T09:25:19.7982568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.7982697Z self._join_processes(fn) 2025-12-04T09:25:19.7983290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.7983464Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.7984086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.7984203Z raise RuntimeError(error) 2025-12-04T09:25:19.7984441Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:25:19.7984576Z Traceback (most recent call last): 2025-12-04T09:25:19.7985123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7985251Z getattr(self, test_name)() 2025-12-04T09:25:19.7985793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7985886Z fn() 2025-12-04T09:25:19.7986404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7986515Z method(*args, **kwargs) 2025-12-04T09:25:19.7987022Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7987142Z method(*args, **kwargs) 2025-12-04T09:25:19.7987651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7987764Z with policy(): 2025-12-04T09:25:19.7988276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7988389Z raise RuntimeError(msg) 2025-12-04T09:25:19.7989945Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 518979584 and is now 613351424. 2025-12-04T09:25:19.7989954Z 2025-12-04T09:25:19.7990150Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7990999Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7991004Z 2025-12-04T09:25:19.7991244Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7991249Z 2025-12-04T09:25:19.7991253Z 2025-12-04T09:25:19.7991466Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.7991704Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.7992538Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-dbe06a751e4355d9.xml - 2025-12-04T09:25:19.7992706Z =========================== short test summary info ============================ 2025-12-04T09:25:19.7993677Z FAILED [9.3701s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:25:19.7993801Z Traceback (most recent call last): 2025-12-04T09:25:19.7994292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.7994393Z getattr(self, test_name)() 2025-12-04T09:25:19.7994919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.7995002Z fn() 2025-12-04T09:25:19.7995458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7995598Z method(*args, **kwargs) 2025-12-04T09:25:19.7996049Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.7996156Z method(*args, **kwargs) 2025-12-04T09:25:19.7996607Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.7996697Z with policy(): 2025-12-04T09:25:19.7997166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.7997266Z raise RuntimeError(msg) 2025-12-04T09:25:19.7998514Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 518979584 and is now 613351424. 2025-12-04T09:25:19.7998521Z 2025-12-04T09:25:19.7998715Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.7999554Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.7999571Z 2025-12-04T09:25:19.7999810Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.7999975Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.8000150Z ======================= 1 failed, 7 deselected in 9.39s ======================== 2025-12-04T09:25:19.8000244Z Got exit code 1 2025-12-04T09:25:19.8000341Z Retrying single test... 2025-12-04T09:25:19.8001085Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7f21dedd43754e1.xml 2025-12-04T09:25:19.8001238Z ============================= test session starts ============================== 2025-12-04T09:25:19.8001566Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.8001668Z cachedir: .pytest_cache 2025-12-04T09:25:19.8002131Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.8002256Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.8002357Z configfile: pytest.ini 2025-12-04T09:25:19.8002839Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.8003044Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.8003963Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8004081Z Running 1 items in this shard 2025-12-04T09:25:19.8004086Z 2025-12-04T09:25:19.8005235Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda I1204 09:21:32.303000 30534 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 30586 2025-12-04T09:25:19.8005700Z I1204 09:21:32.304000 30534 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 30587 2025-12-04T09:25:19.8006142Z I1204 09:21:32.305000 30534 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 30588 2025-12-04T09:25:19.8006627Z I1204 09:21:32.306000 30534 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 30589 2025-12-04T09:25:19.8018391Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8018680Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8021346Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8021479Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8023877Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8023998Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8026507Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8026630Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8028361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8028486Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8030213Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8030337Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8032058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8032181Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8034024Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8034187Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8034611Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8035113Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8036060Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8036541Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8037481Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8037851Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8038753Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8039197Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8040110Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8040556Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8041521Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8041934Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8042955Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8043389Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8045095Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 636420096 and is now 722403328. 2025-12-04T09:25:19.8045421Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8046015Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8047306Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8047650Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8048313Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8048948Z E1204 09:21:39.822000 30586 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.8049330Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8049786Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8050652Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8051280Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8052193Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8052551Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8053432Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8053862Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8054746Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8055229Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8056115Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8056747Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8057683Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8058157Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8059976Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.8060327Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8060963Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8062335Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8062715Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8063441Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8063960Z E1204 09:21:39.823000 30588 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.8064385Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8064894Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8065869Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8066362Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8067329Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8067698Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8068633Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8069171Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8070059Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8070468Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8071302Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8071670Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8072497Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8072917Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8074522Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:25:19.8074831Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8075393Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8076644Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8076994Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8077605Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8078073Z E1204 09:21:39.824000 30587 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.8078448Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8078908Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8079770Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8080207Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8081060Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8081387Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8082223Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8082630Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8083515Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8083928Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8084762Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8085135Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8085961Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8086385Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8087986Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 504299520 and is now 613351424. 2025-12-04T09:25:19.8088289Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8089232Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8090456Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8090781Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8091396Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8091864Z E1204 09:21:39.825000 30589 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.8091955Z FAILED [9.3767s] [100%] 2025-12-04T09:25:19.8091964Z 2025-12-04T09:25:19.8092105Z =================================== FAILURES =================================== 2025-12-04T09:25:19.8092544Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda _ 2025-12-04T09:25:19.8092657Z Traceback (most recent call last): 2025-12-04T09:25:19.8093156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.8093260Z self._join_processes(fn) 2025-12-04T09:25:19.8093787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.8093918Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.8094454Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.8094563Z raise RuntimeError(error) 2025-12-04T09:25:19.8094777Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:25:19.8094881Z Traceback (most recent call last): 2025-12-04T09:25:19.8095418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8095518Z getattr(self, test_name)() 2025-12-04T09:25:19.8095998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8096077Z fn() 2025-12-04T09:25:19.8096774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8096888Z method(*args, **kwargs) 2025-12-04T09:25:19.8097397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8097498Z method(*args, **kwargs) 2025-12-04T09:25:19.8098019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8098116Z with policy(): 2025-12-04T09:25:19.8098640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8098750Z raise RuntimeError(msg) 2025-12-04T09:25:19.8100139Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.8100147Z 2025-12-04T09:25:19.8100374Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8101323Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8101435Z 2025-12-04T09:25:19.8101710Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8101717Z 2025-12-04T09:25:19.8101722Z 2025-12-04T09:25:19.8101951Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.8102253Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.8103193Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7f21dedd43754e1.xml - 2025-12-04T09:25:19.8103365Z =========================== short test summary info ============================ 2025-12-04T09:25:19.8104475Z FAILED [9.3767s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:25:19.8104598Z Traceback (most recent call last): 2025-12-04T09:25:19.8105158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8105271Z getattr(self, test_name)() 2025-12-04T09:25:19.8105810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8105906Z fn() 2025-12-04T09:25:19.8106417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8106520Z method(*args, **kwargs) 2025-12-04T09:25:19.8107034Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8107138Z method(*args, **kwargs) 2025-12-04T09:25:19.8107650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8107749Z with policy(): 2025-12-04T09:25:19.8108260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8108432Z raise RuntimeError(msg) 2025-12-04T09:25:19.8109807Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.8109813Z 2025-12-04T09:25:19.8110013Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8110843Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8110850Z 2025-12-04T09:25:19.8111096Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8111255Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.8111418Z ======================= 1 failed, 7 deselected in 9.40s ======================== 2025-12-04T09:25:19.8111521Z Got exit code 1 2025-12-04T09:25:19.8112283Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8112646Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:25:19.8113329Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7dbc99509eb0f4ce.xml 2025-12-04T09:25:19.8113500Z ============================= test session starts ============================== 2025-12-04T09:25:19.8113821Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.8113919Z cachedir: .pytest_cache 2025-12-04T09:25:19.8114378Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.8114524Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.8114619Z configfile: pytest.ini 2025-12-04T09:25:19.8115107Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.8115292Z collecting ... collected 8 items / 3 deselected / 5 selected 2025-12-04T09:25:19.8115417Z stepcurrent: skipping 3 already run items. 2025-12-04T09:25:19.8115527Z Running 5 items in this shard 2025-12-04T09:25:19.8115531Z 2025-12-04T09:25:19.8116679Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda I1204 09:21:46.414000 30927 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 30979 2025-12-04T09:25:19.8117139Z I1204 09:21:46.415000 30927 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 30980 2025-12-04T09:25:19.8117579Z I1204 09:21:46.416000 30927 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 30981 2025-12-04T09:25:19.8118012Z I1204 09:21:46.417000 30927 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 30982 2025-12-04T09:25:19.8120150Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8120254Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8122966Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8123086Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8125478Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8125593Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8127980Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8128087Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8129882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8130050Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8131780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8131917Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8133722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8133853Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8135460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8135581Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8136000Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8136540Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8137755Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8138244Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8139211Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8139589Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8140527Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8140999Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8141933Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8142391Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8143331Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8143777Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8144729Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8145222Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8147037Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 523173888 and is now 617545728. 2025-12-04T09:25:19.8147375Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8148016Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8149512Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8149812Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8150431Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8150890Z E1204 09:21:53.946000 30980 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.8151278Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8151788Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8152656Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8153094Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8153947Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8154282Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8155108Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8155524Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8156352Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8156758Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8157589Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8157984Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8158827Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8159265Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8160877Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 649003008 and is now 722403328. 2025-12-04T09:25:19.8161180Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8161747Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8162960Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8163256Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8163877Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8164340Z E1204 09:21:53.947000 30979 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.8164722Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8165215Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8166078Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8166513Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8167366Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8167706Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8168535Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8168951Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8169772Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8170176Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8171035Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8171408Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8172275Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8172687Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8174294Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 518979584 and is now 613351424. 2025-12-04T09:25:19.8174598Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8175163Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8176438Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8176928Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8177623Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8178142Z E1204 09:21:53.947000 30981 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.8178637Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8179144Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8180119Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8180611Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8181579Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8181956Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8182889Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8183352Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8184291Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8184776Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8185718Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8186167Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8187112Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8187578Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8189422Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.8189740Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8190297Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8191517Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8191811Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8192436Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8192942Z E1204 09:21:53.949000 30982 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.8193036Z FAILED [9.3888s] [ 20%] 2025-12-04T09:25:19.8193042Z 2025-12-04T09:25:19.8193181Z =================================== FAILURES =================================== 2025-12-04T09:25:19.8193613Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda _ 2025-12-04T09:25:19.8193734Z Traceback (most recent call last): 2025-12-04T09:25:19.8194218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.8194322Z self._join_processes(fn) 2025-12-04T09:25:19.8194860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.8194984Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.8195537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.8195637Z raise RuntimeError(error) 2025-12-04T09:25:19.8195843Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.8195961Z Traceback (most recent call last): 2025-12-04T09:25:19.8196439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8196537Z getattr(self, test_name)() 2025-12-04T09:25:19.8197016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8197124Z fn() 2025-12-04T09:25:19.8197580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8197673Z method(*args, **kwargs) 2025-12-04T09:25:19.8198124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8198252Z method(*args, **kwargs) 2025-12-04T09:25:19.8198696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8198780Z with policy(): 2025-12-04T09:25:19.8199239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8199336Z raise RuntimeError(msg) 2025-12-04T09:25:19.8200562Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 523173888 and is now 617545728. 2025-12-04T09:25:19.8200571Z 2025-12-04T09:25:19.8200762Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8201597Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8201609Z 2025-12-04T09:25:19.8201846Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8201851Z 2025-12-04T09:25:19.8201855Z 2025-12-04T09:25:19.8202053Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.8202290Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.8203122Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7dbc99509eb0f4ce.xml - 2025-12-04T09:25:19.8203282Z =========================== short test summary info ============================ 2025-12-04T09:25:19.8204327Z FAILED [9.3888s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.8204439Z Traceback (most recent call last): 2025-12-04T09:25:19.8204936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8205035Z getattr(self, test_name)() 2025-12-04T09:25:19.8205524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8205603Z fn() 2025-12-04T09:25:19.8206057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8206152Z method(*args, **kwargs) 2025-12-04T09:25:19.8206606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8206699Z method(*args, **kwargs) 2025-12-04T09:25:19.8207153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8207238Z with policy(): 2025-12-04T09:25:19.8207698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8207791Z raise RuntimeError(msg) 2025-12-04T09:25:19.8209024Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 523173888 and is now 617545728. 2025-12-04T09:25:19.8209062Z 2025-12-04T09:25:19.8209254Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8210085Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8210116Z 2025-12-04T09:25:19.8210356Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8210517Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.8210673Z ======================= 1 failed, 3 deselected in 9.41s ======================== 2025-12-04T09:25:19.8210762Z Got exit code 1 2025-12-04T09:25:19.8210854Z Retrying single test... 2025-12-04T09:25:19.8211532Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-5b4af92028672eb6.xml 2025-12-04T09:25:19.8211679Z ============================= test session starts ============================== 2025-12-04T09:25:19.8211994Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.8212096Z cachedir: .pytest_cache 2025-12-04T09:25:19.8212553Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.8212664Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.8212758Z configfile: pytest.ini 2025-12-04T09:25:19.8213229Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.8213421Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.8214317Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8214417Z Running 1 items in this shard 2025-12-04T09:25:19.8214421Z 2025-12-04T09:25:19.8215620Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda I1204 09:22:00.534000 31320 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 31372 2025-12-04T09:25:19.8216066Z I1204 09:22:00.534000 31320 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 31373 2025-12-04T09:25:19.8216747Z I1204 09:22:00.535000 31320 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 31374 2025-12-04T09:25:19.8217240Z I1204 09:22:00.536000 31320 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 31375 2025-12-04T09:25:19.8219651Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8219770Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8222355Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8222539Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8224944Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8225093Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8227500Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8227626Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8229376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8229512Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8231232Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8231367Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8233235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8233360Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8234974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8235091Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8235508Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8235989Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8236909Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8237365Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8238272Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8238655Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8239531Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8240092Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8240921Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8241328Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8242164Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8242538Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8243378Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8243791Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8245393Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.8245744Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8246303Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8247511Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8247808Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8248424Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8248882Z E1204 09:22:08.077000 31373 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.8249269Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8249711Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8250571Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8251002Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8251881Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8252215Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8253083Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8253490Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8254325Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8254735Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8255572Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8255944Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8257072Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8257537Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8259413Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 649003008 and is now 722403328. 2025-12-04T09:25:19.8259758Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8260390Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8261757Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8262094Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8262795Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8263313Z E1204 09:22:08.078000 31372 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.8263738Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8264245Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8265218Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8265734Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8266702Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8267114Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8268047Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8268617Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8269584Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8269995Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8270829Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8271199Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8272034Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8272441Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8274092Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.8274399Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8274960Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8276166Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8276465Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8277078Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8277540Z E1204 09:22:08.079000 31375 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.8277920Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8278370Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8279234Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8279691Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8280546Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8280900Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8281729Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8282137Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8282970Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8283378Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8284208Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8284578Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8285404Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8285824Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8287471Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.8287776Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8288333Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8289538Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8289840Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8290458Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8290917Z E1204 09:22:08.079000 31374 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.8291006Z FAILED [9.4044s] [100%] 2025-12-04T09:25:19.8291012Z 2025-12-04T09:25:19.8291145Z =================================== FAILURES =================================== 2025-12-04T09:25:19.8291575Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda _ 2025-12-04T09:25:19.8291708Z Traceback (most recent call last): 2025-12-04T09:25:19.8292211Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.8292317Z self._join_processes(fn) 2025-12-04T09:25:19.8292879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.8293006Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.8293544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.8293650Z raise RuntimeError(error) 2025-12-04T09:25:19.8293857Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.8293969Z Traceback (most recent call last): 2025-12-04T09:25:19.8294450Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8294551Z getattr(self, test_name)() 2025-12-04T09:25:19.8295030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8295113Z fn() 2025-12-04T09:25:19.8295562Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8295664Z method(*args, **kwargs) 2025-12-04T09:25:19.8296111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8296275Z method(*args, **kwargs) 2025-12-04T09:25:19.8296914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8297011Z with policy(): 2025-12-04T09:25:19.8297537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8297646Z raise RuntimeError(msg) 2025-12-04T09:25:19.8299092Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.8299112Z 2025-12-04T09:25:19.8299331Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8300263Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8300269Z 2025-12-04T09:25:19.8300549Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8300556Z 2025-12-04T09:25:19.8300561Z 2025-12-04T09:25:19.8300778Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.8301052Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.8301990Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-5b4af92028672eb6.xml - 2025-12-04T09:25:19.8302159Z =========================== short test summary info ============================ 2025-12-04T09:25:19.8303250Z FAILED [9.4044s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.8303372Z Traceback (most recent call last): 2025-12-04T09:25:19.8303937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8304095Z getattr(self, test_name)() 2025-12-04T09:25:19.8304632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8304734Z fn() 2025-12-04T09:25:19.8305277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8305388Z method(*args, **kwargs) 2025-12-04T09:25:19.8305896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8305997Z method(*args, **kwargs) 2025-12-04T09:25:19.8306515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8306610Z with policy(): 2025-12-04T09:25:19.8307122Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8307241Z raise RuntimeError(msg) 2025-12-04T09:25:19.8308732Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.8308740Z 2025-12-04T09:25:19.8308959Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8309870Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8309875Z 2025-12-04T09:25:19.8310140Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8310317Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.8310489Z ======================= 1 failed, 7 deselected in 9.43s ======================== 2025-12-04T09:25:19.8310588Z Got exit code 1 2025-12-04T09:25:19.8310685Z Retrying single test... 2025-12-04T09:25:19.8311525Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c67b11ef8bde4252.xml 2025-12-04T09:25:19.8311695Z ============================= test session starts ============================== 2025-12-04T09:25:19.8312031Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.8312143Z cachedir: .pytest_cache 2025-12-04T09:25:19.8312642Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.8312759Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.8312871Z configfile: pytest.ini 2025-12-04T09:25:19.8313387Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.8313596Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.8314580Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8314689Z Running 1 items in this shard 2025-12-04T09:25:19.8314694Z 2025-12-04T09:25:19.8315944Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda I1204 09:22:14.674000 31713 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 31765 2025-12-04T09:25:19.8316427Z I1204 09:22:14.674000 31713 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 31766 2025-12-04T09:25:19.8316941Z I1204 09:22:14.675000 31713 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 31767 2025-12-04T09:25:19.8317420Z I1204 09:22:14.676000 31713 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 31768 2025-12-04T09:25:19.8319880Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8319985Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8322625Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8322750Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8325131Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8325251Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8327730Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:188: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8327849Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8329584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8329718Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8331441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8331577Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8333386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8333551Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8335214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8335374Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8335812Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8336392Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8337548Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8338040Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8339016Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8339409Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8340346Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8340830Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8341771Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8342310Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8343247Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8343669Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8344621Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8345090Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8346920Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:25:19.8347267Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8347919Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8349424Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8349772Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8350411Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8350874Z E1204 09:22:22.250000 31765 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.8351270Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8351720Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8352605Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8353050Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8353912Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8354261Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8355094Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8355524Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8356420Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8356848Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8357675Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8358050Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8358894Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8359314Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8360937Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 518979584 and is now 615448576. 2025-12-04T09:25:19.8361243Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8361821Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8363065Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8363394Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8364026Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8364488Z E1204 09:22:22.250000 31767 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.8364883Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8365337Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8366226Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8366661Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8367524Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8367870Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8368699Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8369122Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8369998Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8370410Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8371249Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8371623Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8372467Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8372884Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8374496Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.8374797Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8375400Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8376874Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8377253Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8377965Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8378491Z E1204 09:22:22.252000 31766 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.8378930Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8379439Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8380426Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8380925Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8381888Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8382273Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8383219Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8383747Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8384686Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8385147Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8386094Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8386518Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8387473Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8387945Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8389760Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.8390093Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8390657Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8391880Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8392210Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8392843Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8393306Z E1204 09:22:22.253000 31768 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.8393417Z FAILED [9.4964s] [100%] 2025-12-04T09:25:19.8393423Z 2025-12-04T09:25:19.8393557Z =================================== FAILURES =================================== 2025-12-04T09:25:19.8393993Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda _ 2025-12-04T09:25:19.8394119Z Traceback (most recent call last): 2025-12-04T09:25:19.8394613Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.8394716Z self._join_processes(fn) 2025-12-04T09:25:19.8395252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.8395384Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.8395938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.8396046Z raise RuntimeError(error) 2025-12-04T09:25:19.8396259Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:25:19.8396381Z Traceback (most recent call last): 2025-12-04T09:25:19.8396912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8397030Z getattr(self, test_name)() 2025-12-04T09:25:19.8397508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8397594Z fn() 2025-12-04T09:25:19.8398059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8398156Z method(*args, **kwargs) 2025-12-04T09:25:19.8398609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8398720Z method(*args, **kwargs) 2025-12-04T09:25:19.8399175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8399278Z with policy(): 2025-12-04T09:25:19.8399739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8399842Z raise RuntimeError(msg) 2025-12-04T09:25:19.8401086Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:25:19.8401092Z 2025-12-04T09:25:19.8401289Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8402168Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8402173Z 2025-12-04T09:25:19.8402416Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8402448Z 2025-12-04T09:25:19.8402451Z 2025-12-04T09:25:19.8402663Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.8402898Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.8403735Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c67b11ef8bde4252.xml - 2025-12-04T09:25:19.8403903Z =========================== short test summary info ============================ 2025-12-04T09:25:19.8404875Z FAILED [9.4964s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:25:19.8405001Z Traceback (most recent call last): 2025-12-04T09:25:19.8405497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8405601Z getattr(self, test_name)() 2025-12-04T09:25:19.8406094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8406177Z fn() 2025-12-04T09:25:19.8406792Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8406909Z method(*args, **kwargs) 2025-12-04T09:25:19.8407387Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8407503Z method(*args, **kwargs) 2025-12-04T09:25:19.8407978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8408071Z with policy(): 2025-12-04T09:25:19.8408638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8408747Z raise RuntimeError(msg) 2025-12-04T09:25:19.8410053Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:25:19.8410070Z 2025-12-04T09:25:19.8410276Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8411158Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8411166Z 2025-12-04T09:25:19.8411429Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8411605Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.8411795Z ======================= 1 failed, 7 deselected in 9.52s ======================== 2025-12-04T09:25:19.8411891Z Got exit code 1 2025-12-04T09:25:19.8412695Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8413097Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:25:19.8413812Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c057f5798619892b.xml 2025-12-04T09:25:19.8414008Z ============================= test session starts ============================== 2025-12-04T09:25:19.8414338Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.8414451Z cachedir: .pytest_cache 2025-12-04T09:25:19.8414979Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.8415098Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.8415201Z configfile: pytest.ini 2025-12-04T09:25:19.8415718Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.8415914Z collecting ... collected 8 items / 4 deselected / 4 selected 2025-12-04T09:25:19.8416063Z stepcurrent: skipping 4 already run items. 2025-12-04T09:25:19.8416248Z Running 4 items in this shard 2025-12-04T09:25:19.8416257Z 2025-12-04T09:25:19.8417739Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda I1204 09:22:28.804000 32106 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 32158 2025-12-04T09:25:19.8418256Z I1204 09:22:28.804000 32106 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 32159 2025-12-04T09:25:19.8418753Z I1204 09:22:28.805000 32106 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 32160 2025-12-04T09:25:19.8419261Z I1204 09:22:28.806000 32106 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 32161 2025-12-04T09:25:19.8421953Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8422095Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8424492Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8424622Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8427033Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8427164Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8429553Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8429711Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8431472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8431645Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8433525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8433645Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8435187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8435305Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8436844Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8436961Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8439470Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8439580Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8441701Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8441817Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8443935Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8444051Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8446174Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8446322Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8447862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8448034Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8449565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8449729Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8451248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8451396Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8452920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8453066Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8453852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8453960Z local_shape = tensor.shape 2025-12-04T09:25:19.8454688Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8454785Z local_shape = tensor.shape 2025-12-04T09:25:19.8455498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8455606Z local_shape = tensor.shape 2025-12-04T09:25:19.8456386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8456500Z local_shape = tensor.shape 2025-12-04T09:25:19.8457457Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8457552Z tensor.shape, 2025-12-04T09:25:19.8458358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8458452Z tensor.shape, 2025-12-04T09:25:19.8459259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8459387Z tensor.shape, 2025-12-04T09:25:19.8460185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8460286Z tensor.dtype, 2025-12-04T09:25:19.8461127Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8461219Z tensor.dtype, 2025-12-04T09:25:19.8462023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8462114Z tensor.shape, 2025-12-04T09:25:19.8462926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8463024Z tensor.dtype, 2025-12-04T09:25:19.8463821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8463923Z tensor.dtype, 2025-12-04T09:25:19.8464356Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8464868Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8465844Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8466324Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8467302Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8467727Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8468668Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8469191Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8470027Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8470434Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8471259Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8471632Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8472456Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8472873Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8474509Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 640614400 and is now 732889088. 2025-12-04T09:25:19.8474869Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8475429Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8476668Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8476974Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8477582Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8478048Z E1204 09:22:36.817000 32158 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.8478423Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8478872Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8479740Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8480163Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8481071Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8481401Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8482231Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8482637Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8483463Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8483875Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8484697Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8485075Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8485903Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8486320Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8487992Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 619642880. 2025-12-04T09:25:19.8488324Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8488882Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8490115Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8490417Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8491033Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8491497Z E1204 09:22:36.818000 32159 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.8491867Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8492308Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8493178Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8493602Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8494507Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8494834Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8495664Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8496072Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8497226Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8497699Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8498626Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8499049Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8499982Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8500485Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8502328Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.8502692Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8503328Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8504722Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8505071Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8505761Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8506286Z E1204 09:22:36.830000 32160 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.8506705Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8507202Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8508184Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8508714Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8509682Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8510008Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8510838Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8511243Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8512072Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8512481Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8513306Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8513684Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8514552Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8514964Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8516638Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 481230848 and is now 615448576. 2025-12-04T09:25:19.8516938Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8517501Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8518745Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8519048Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8519655Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8520116Z E1204 09:22:36.831000 32161 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.8520203Z FAILED [9.8827s] [ 25%] 2025-12-04T09:25:19.8520209Z 2025-12-04T09:25:19.8520341Z =================================== FAILURES =================================== 2025-12-04T09:25:19.8520933Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda _ 2025-12-04T09:25:19.8521041Z Traceback (most recent call last): 2025-12-04T09:25:19.8521834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.8521956Z self._join_processes(fn) 2025-12-04T09:25:19.8522544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.8522694Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.8523300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.8523413Z raise RuntimeError(error) 2025-12-04T09:25:19.8523650Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:25:19.8523769Z Traceback (most recent call last): 2025-12-04T09:25:19.8524312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8524426Z getattr(self, test_name)() 2025-12-04T09:25:19.8524961Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8525052Z fn() 2025-12-04T09:25:19.8525557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8525661Z method(*args, **kwargs) 2025-12-04T09:25:19.8526177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8526278Z method(*args, **kwargs) 2025-12-04T09:25:19.8526788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8526920Z with policy(): 2025-12-04T09:25:19.8527427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8527543Z raise RuntimeError(msg) 2025-12-04T09:25:19.8528996Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 640614400 and is now 732889088. 2025-12-04T09:25:19.8529003Z 2025-12-04T09:25:19.8529228Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8530199Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8530208Z 2025-12-04T09:25:19.8530475Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8530487Z 2025-12-04T09:25:19.8530491Z 2025-12-04T09:25:19.8530712Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.8530972Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.8531906Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c057f5798619892b.xml - 2025-12-04T09:25:19.8532072Z =========================== short test summary info ============================ 2025-12-04T09:25:19.8533206Z FAILED [9.8827s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:25:19.8533327Z Traceback (most recent call last): 2025-12-04T09:25:19.8533942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8534043Z getattr(self, test_name)() 2025-12-04T09:25:19.8534622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8534700Z fn() 2025-12-04T09:25:19.8535150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8535239Z method(*args, **kwargs) 2025-12-04T09:25:19.8535696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8535785Z method(*args, **kwargs) 2025-12-04T09:25:19.8536303Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8536399Z with policy(): 2025-12-04T09:25:19.8537053Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8537159Z raise RuntimeError(msg) 2025-12-04T09:25:19.8538601Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 640614400 and is now 732889088. 2025-12-04T09:25:19.8538609Z 2025-12-04T09:25:19.8538821Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8539796Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8539837Z 2025-12-04T09:25:19.8540101Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8540285Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.8540464Z ======================= 1 failed, 4 deselected in 9.90s ======================== 2025-12-04T09:25:19.8540587Z Got exit code 1 2025-12-04T09:25:19.8540695Z Retrying single test... 2025-12-04T09:25:19.8541454Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-aae1a2ba6806c0ef.xml 2025-12-04T09:25:19.8541622Z ============================= test session starts ============================== 2025-12-04T09:25:19.8541969Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.8542070Z cachedir: .pytest_cache 2025-12-04T09:25:19.8542591Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.8542713Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.8542817Z configfile: pytest.ini 2025-12-04T09:25:19.8543361Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.8543565Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.8544620Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8544728Z Running 1 items in this shard 2025-12-04T09:25:19.8544733Z 2025-12-04T09:25:19.8546057Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda I1204 09:22:43.434000 32559 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 32611 2025-12-04T09:25:19.8546569Z I1204 09:22:43.435000 32559 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 32612 2025-12-04T09:25:19.8547114Z I1204 09:22:43.435000 32559 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 32613 2025-12-04T09:25:19.8547616Z I1204 09:22:43.436000 32559 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 32614 2025-12-04T09:25:19.8550056Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8550167Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8552448Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8552552Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8554665Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8554794Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8556901Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8557023Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8558568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8558684Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8560226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8560336Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8561866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8561977Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8563567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8563678Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8565812Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8565915Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8568027Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8568128Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8570229Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8570381Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8572497Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8572599Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8574121Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8574274Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8575779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8575929Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8577839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8578005Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8579706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8579863Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8580675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8580783Z local_shape = tensor.shape 2025-12-04T09:25:19.8581601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8581712Z local_shape = tensor.shape 2025-12-04T09:25:19.8582509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8582621Z local_shape = tensor.shape 2025-12-04T09:25:19.8583426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8583569Z local_shape = tensor.shape 2025-12-04T09:25:19.8584366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8584465Z tensor.shape, 2025-12-04T09:25:19.8585309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8585403Z tensor.shape, 2025-12-04T09:25:19.8586210Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8586307Z tensor.shape, 2025-12-04T09:25:19.8587107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8587212Z tensor.dtype, 2025-12-04T09:25:19.8588013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8588110Z tensor.shape, 2025-12-04T09:25:19.8589021Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8589108Z tensor.dtype, 2025-12-04T09:25:19.8589883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8589970Z tensor.dtype, 2025-12-04T09:25:19.8590747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8590844Z tensor.dtype, 2025-12-04T09:25:19.8591260Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8591809Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8592754Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8593221Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8594160Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8594520Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8595433Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8595876Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8596786Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8597228Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8598346Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8598749Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8599582Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8600029Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8601666Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 644808704 and is now 724500480. 2025-12-04T09:25:19.8601975Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8602536Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8603773Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8604081Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8604689Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8605154Z E1204 09:22:51.010000 32611 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.8605579Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8606210Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8607127Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8607575Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8608485Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8608829Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8609711Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8610142Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8611022Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8611447Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8612346Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8612747Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8613653Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8614090Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8615831Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 166658048 and is now 619642880. 2025-12-04T09:25:19.8616142Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8616951Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8618345Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8618687Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8619380Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8619978Z E1204 09:22:51.010000 32614 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.8620410Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8621099Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8622089Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8622570Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8623545Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8623916Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8624864Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8625320Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8626253Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8626777Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8627711Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8628178Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8629107Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8629578Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8631430Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.8631775Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8632405Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8633890Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8634193Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8634867Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8635340Z E1204 09:22:51.012000 32612 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.8635715Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8636160Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8637033Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8637465Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8638529Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8638874Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8639757Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8640186Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8641062Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8641530Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8642430Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8642826Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8643703Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8644143Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8645882Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 527368192 and is now 615448576. 2025-12-04T09:25:19.8646197Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8646799Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8648292Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8648624Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8649347Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8649963Z E1204 09:22:51.013000 32613 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.8650057Z FAILED [9.5086s] [100%] 2025-12-04T09:25:19.8650062Z 2025-12-04T09:25:19.8650202Z =================================== FAILURES =================================== 2025-12-04T09:25:19.8650696Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda _ 2025-12-04T09:25:19.8650810Z Traceback (most recent call last): 2025-12-04T09:25:19.8651333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.8651435Z self._join_processes(fn) 2025-12-04T09:25:19.8651990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.8652131Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.8652703Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.8652807Z raise RuntimeError(error) 2025-12-04T09:25:19.8653035Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.8653145Z Traceback (most recent call last): 2025-12-04T09:25:19.8653655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8653785Z getattr(self, test_name)() 2025-12-04T09:25:19.8654284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8654371Z fn() 2025-12-04T09:25:19.8654853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8654977Z method(*args, **kwargs) 2025-12-04T09:25:19.8655458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8655552Z method(*args, **kwargs) 2025-12-04T09:25:19.8656030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8656119Z with policy(): 2025-12-04T09:25:19.8656834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8656953Z raise RuntimeError(msg) 2025-12-04T09:25:19.8658379Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.8658388Z 2025-12-04T09:25:19.8658607Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8659574Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8659580Z 2025-12-04T09:25:19.8659845Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8659859Z 2025-12-04T09:25:19.8659863Z 2025-12-04T09:25:19.8660083Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.8660345Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.8661351Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-aae1a2ba6806c0ef.xml - 2025-12-04T09:25:19.8661525Z =========================== short test summary info ============================ 2025-12-04T09:25:19.8662659Z FAILED [9.5086s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.8662779Z Traceback (most recent call last): 2025-12-04T09:25:19.8663330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8663446Z getattr(self, test_name)() 2025-12-04T09:25:19.8663983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8664071Z fn() 2025-12-04T09:25:19.8664586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8664692Z method(*args, **kwargs) 2025-12-04T09:25:19.8665198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8665300Z method(*args, **kwargs) 2025-12-04T09:25:19.8665799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8665899Z with policy(): 2025-12-04T09:25:19.8666411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8666553Z raise RuntimeError(msg) 2025-12-04T09:25:19.8667989Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.8668037Z 2025-12-04T09:25:19.8668256Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8669398Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8669403Z 2025-12-04T09:25:19.8669642Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8669808Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.8669968Z ======================= 1 failed, 7 deselected in 9.53s ======================== 2025-12-04T09:25:19.8670053Z Got exit code 1 2025-12-04T09:25:19.8670150Z Retrying single test... 2025-12-04T09:25:19.8670827Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c34ce2d8050066e8.xml 2025-12-04T09:25:19.8670973Z ============================= test session starts ============================== 2025-12-04T09:25:19.8671281Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.8671379Z cachedir: .pytest_cache 2025-12-04T09:25:19.8671842Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.8671952Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.8672044Z configfile: pytest.ini 2025-12-04T09:25:19.8672527Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.8672712Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.8673692Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8673797Z Running 1 items in this shard 2025-12-04T09:25:19.8673801Z 2025-12-04T09:25:19.8674971Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda I1204 09:22:57.624000 33012 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 33064 2025-12-04T09:25:19.8675423Z I1204 09:22:57.625000 33012 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 33065 2025-12-04T09:25:19.8675863Z I1204 09:22:57.626000 33012 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 33066 2025-12-04T09:25:19.8676306Z I1204 09:22:57.626000 33012 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 33067 2025-12-04T09:25:19.8678448Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8678559Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8680699Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8680858Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8682976Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8683088Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8685212Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8685319Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8686853Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8686966Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8688565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8688679Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8690203Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8690312Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8691839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8691948Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8694083Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8694209Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8696612Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8696921Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8699313Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8699431Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8701803Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8701918Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8703620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8703848Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8705547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8705714Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8707413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8707577Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8709360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8709499Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8710223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8710348Z local_shape = tensor.shape 2025-12-04T09:25:19.8711065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8711165Z local_shape = tensor.shape 2025-12-04T09:25:19.8711901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8712001Z local_shape = tensor.shape 2025-12-04T09:25:19.8712712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8712804Z tensor.shape, 2025-12-04T09:25:19.8713515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8713599Z tensor.shape, 2025-12-04T09:25:19.8714316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8714401Z tensor.shape, 2025-12-04T09:25:19.8715122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8715202Z tensor.dtype, 2025-12-04T09:25:19.8715908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8715996Z tensor.dtype, 2025-12-04T09:25:19.8716703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8716798Z tensor.dtype, 2025-12-04T09:25:19.8717515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8717659Z local_shape = tensor.shape 2025-12-04T09:25:19.8718391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8718474Z tensor.shape, 2025-12-04T09:25:19.8719183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8719274Z tensor.dtype, 2025-12-04T09:25:19.8719657Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8720116Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8721117Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8721770Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8722745Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8723117Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8724062Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8724597Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8725546Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8726047Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8726981Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8727411Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8728355Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8728829Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8730678Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 531562496 and is now 619642880. 2025-12-04T09:25:19.8731026Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8731659Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8733131Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8733475Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8734197Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8734661Z E1204 09:23:05.672000 33066 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.8735044Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8735499Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8736424Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8737057Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8738021Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8738391Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8739364Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8739824Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8740806Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8741265Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8742195Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8742623Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8743564Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8744035Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8745881Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 636420096 and is now 724500480. 2025-12-04T09:25:19.8746231Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8746914Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8748313Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8748656Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8749389Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8749857Z E1204 09:23:05.673000 33064 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.8750234Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8750689Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8751559Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8751985Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8752845Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8753205Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8754046Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8754532Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8755355Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8755772Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8756597Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8756982Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8757813Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8758230Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8759872Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.8760183Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8760788Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8762024Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8762326Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8762940Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8763402Z E1204 09:23:05.674000 33065 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.8763781Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8764229Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8765095Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8765520Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8766382Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8766734Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8767567Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8767999Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8768825Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8769238Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8770067Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8770443Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8771270Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8771690Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8773319Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.8773679Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8774255Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8775491Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8775792Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8776470Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8777156Z E1204 09:23:05.675000 33067 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.8777327Z FAILED [9.9274s] [100%] 2025-12-04T09:25:19.8777333Z 2025-12-04T09:25:19.8777478Z =================================== FAILURES =================================== 2025-12-04T09:25:19.8778012Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda _ 2025-12-04T09:25:19.8778133Z Traceback (most recent call last): 2025-12-04T09:25:19.8778690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.8778801Z self._join_processes(fn) 2025-12-04T09:25:19.8779424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.8779572Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.8780183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.8780323Z raise RuntimeError(error) 2025-12-04T09:25:19.8780566Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:25:19.8780682Z Traceback (most recent call last): 2025-12-04T09:25:19.8781228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8781338Z getattr(self, test_name)() 2025-12-04T09:25:19.8781872Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8781967Z fn() 2025-12-04T09:25:19.8782479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8782579Z method(*args, **kwargs) 2025-12-04T09:25:19.8783091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8783199Z method(*args, **kwargs) 2025-12-04T09:25:19.8783710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8783808Z with policy(): 2025-12-04T09:25:19.8784315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8784430Z raise RuntimeError(msg) 2025-12-04T09:25:19.8785855Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 636420096 and is now 724500480. 2025-12-04T09:25:19.8785863Z 2025-12-04T09:25:19.8786088Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8787118Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8787126Z 2025-12-04T09:25:19.8787400Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8787405Z 2025-12-04T09:25:19.8787409Z 2025-12-04T09:25:19.8787625Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.8787890Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.8788936Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c34ce2d8050066e8.xml - 2025-12-04T09:25:19.8789090Z =========================== short test summary info ============================ 2025-12-04T09:25:19.8790097Z FAILED [9.9274s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:25:19.8790205Z Traceback (most recent call last): 2025-12-04T09:25:19.8790700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8790807Z getattr(self, test_name)() 2025-12-04T09:25:19.8791283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8791402Z fn() 2025-12-04T09:25:19.8791854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8791945Z method(*args, **kwargs) 2025-12-04T09:25:19.8792406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8792524Z method(*args, **kwargs) 2025-12-04T09:25:19.8792968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8793063Z with policy(): 2025-12-04T09:25:19.8793517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8793619Z raise RuntimeError(msg) 2025-12-04T09:25:19.8794880Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 636420096 and is now 724500480. 2025-12-04T09:25:19.8794887Z 2025-12-04T09:25:19.8795076Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8795947Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8795954Z 2025-12-04T09:25:19.8796188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8796353Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.8796509Z ======================= 1 failed, 7 deselected in 9.95s ======================== 2025-12-04T09:25:19.8796593Z Got exit code 1 2025-12-04T09:25:19.8797382Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda 2025-12-04T09:25:19.8797746Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:25:19.8798797Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-fde5b3ce12e5a98a.xml 2025-12-04T09:25:19.8798945Z ============================= test session starts ============================== 2025-12-04T09:25:19.8799254Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.8799352Z cachedir: .pytest_cache 2025-12-04T09:25:19.8799809Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.8799922Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.8800016Z configfile: pytest.ini 2025-12-04T09:25:19.8800492Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.8800681Z collecting ... collected 8 items / 5 deselected / 3 selected 2025-12-04T09:25:19.8800805Z stepcurrent: skipping 5 already run items. 2025-12-04T09:25:19.8800903Z Running 3 items in this shard 2025-12-04T09:25:19.8800908Z 2025-12-04T09:25:19.8802093Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda I1204 09:23:12.214000 33465 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 33517 2025-12-04T09:25:19.8802532Z I1204 09:23:12.215000 33465 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 33518 2025-12-04T09:25:19.8802975Z I1204 09:23:12.216000 33465 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 33519 2025-12-04T09:25:19.8803436Z I1204 09:23:12.216000 33465 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 33520 2025-12-04T09:25:19.8805587Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8805713Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8807845Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8807948Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8810077Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8810174Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8812319Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8812425Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8813951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8814071Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8815592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8815714Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8817565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8817699Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8819413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8819606Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8822183Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8822295Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8824692Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8824804Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8827179Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8827291Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8829791Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8829904Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8831616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8831779Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8833539Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8833689Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8835295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8835478Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8837076Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8837272Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8838031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8838144Z local_shape = tensor.shape 2025-12-04T09:25:19.8838901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8839008Z local_shape = tensor.shape 2025-12-04T09:25:19.8839778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8839882Z local_shape = tensor.shape 2025-12-04T09:25:19.8840649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8840739Z tensor.shape, 2025-12-04T09:25:19.8841495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8841596Z tensor.shape, 2025-12-04T09:25:19.8842352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8842453Z tensor.dtype, 2025-12-04T09:25:19.8843262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8843352Z tensor.dtype, 2025-12-04T09:25:19.8844111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8844199Z tensor.shape, 2025-12-04T09:25:19.8844958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8845044Z tensor.dtype, 2025-12-04T09:25:19.8845796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8845905Z local_shape = tensor.shape 2025-12-04T09:25:19.8846662Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8846754Z tensor.shape, 2025-12-04T09:25:19.8847510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8847599Z tensor.dtype, 2025-12-04T09:25:19.8848012Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8848487Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8849439Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8849908Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8850840Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8851200Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8852083Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8852529Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8853407Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8853942Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8854772Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8855141Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8855980Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8856465Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8858527Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 168755200 and is now 621740032. 2025-12-04T09:25:19.8858868Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8859500Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8860918Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8861258Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8861956Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8862475Z E1204 09:23:20.302000 33520 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.8862908Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8863437Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8864415Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8864930Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8865895Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8866271Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8867199Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8867665Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8868703Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8869248Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8870125Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8870515Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8871404Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8871894Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8873638Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 640614400 and is now 732889088. 2025-12-04T09:25:19.8873955Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8874549Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8875866Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8876186Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8876836Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8877329Z E1204 09:23:20.306000 33517 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.8877764Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8878237Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8879160Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8879658Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8880740Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8881106Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8882019Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8882464Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8883378Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8883821Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8884740Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8885147Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8886223Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8886659Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8888382Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.8888880Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8889495Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8890851Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8891175Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8891845Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8892347Z E1204 09:23:20.322000 33519 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.8892782Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8893278Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.8894244Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8894715Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.8895648Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8896010Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.8897181Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8897649Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8898603Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8899060Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.8900001Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8900486Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.8901432Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8901893Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.8903730Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.8904075Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8904714Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8906121Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8906462Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.8907160Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8907715Z E1204 09:23:20.323000 33518 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.8907817Z FAILED [10.1799s] [ 33%] 2025-12-04T09:25:19.8907827Z 2025-12-04T09:25:19.8908013Z =================================== FAILURES =================================== 2025-12-04T09:25:19.8908776Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda _ 2025-12-04T09:25:19.8908901Z Traceback (most recent call last): 2025-12-04T09:25:19.8909418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.8909524Z self._join_processes(fn) 2025-12-04T09:25:19.8910080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.8910215Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.8910793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.8910901Z raise RuntimeError(error) 2025-12-04T09:25:19.8911127Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.8911248Z Traceback (most recent call last): 2025-12-04T09:25:19.8911758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8911860Z getattr(self, test_name)() 2025-12-04T09:25:19.8912376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8912460Z fn() 2025-12-04T09:25:19.8912942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8913043Z method(*args, **kwargs) 2025-12-04T09:25:19.8913519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8913627Z method(*args, **kwargs) 2025-12-04T09:25:19.8914155Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8914248Z with policy(): 2025-12-04T09:25:19.8914735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8914840Z raise RuntimeError(msg) 2025-12-04T09:25:19.8916171Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.8916179Z 2025-12-04T09:25:19.8916383Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8917298Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8917313Z 2025-12-04T09:25:19.8917564Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8917569Z 2025-12-04T09:25:19.8917724Z Process 3 exited with error code 10 and exception: 2025-12-04T09:25:19.8917843Z Traceback (most recent call last): 2025-12-04T09:25:19.8918361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8918462Z getattr(self, test_name)() 2025-12-04T09:25:19.8918970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8919083Z fn() 2025-12-04T09:25:19.8919565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8919661Z method(*args, **kwargs) 2025-12-04T09:25:19.8920140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8920269Z method(*args, **kwargs) 2025-12-04T09:25:19.8920863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8920962Z with policy(): 2025-12-04T09:25:19.8921638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8921743Z raise RuntimeError(msg) 2025-12-04T09:25:19.8923168Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 168755200 and is now 621740032. 2025-12-04T09:25:19.8923177Z 2025-12-04T09:25:19.8923396Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8924361Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8924367Z 2025-12-04T09:25:19.8924635Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8924640Z 2025-12-04T09:25:19.8924644Z 2025-12-04T09:25:19.8924860Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.8925128Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.8926068Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-fde5b3ce12e5a98a.xml - 2025-12-04T09:25:19.8926340Z =========================== short test summary info ============================ 2025-12-04T09:25:19.8927466Z FAILED [10.1799s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.8927593Z Traceback (most recent call last): 2025-12-04T09:25:19.8928145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8928257Z getattr(self, test_name)() 2025-12-04T09:25:19.8928802Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8928894Z fn() 2025-12-04T09:25:19.8929401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8929511Z method(*args, **kwargs) 2025-12-04T09:25:19.8930020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8930134Z method(*args, **kwargs) 2025-12-04T09:25:19.8930639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8930736Z with policy(): 2025-12-04T09:25:19.8931261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8931365Z raise RuntimeError(msg) 2025-12-04T09:25:19.8932775Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.8932842Z 2025-12-04T09:25:19.8933061Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8934118Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8934123Z 2025-12-04T09:25:19.8934381Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8934386Z 2025-12-04T09:25:19.8934538Z Process 3 exited with error code 10 and exception: 2025-12-04T09:25:19.8934656Z Traceback (most recent call last): 2025-12-04T09:25:19.8935169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.8935271Z getattr(self, test_name)() 2025-12-04T09:25:19.8935783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.8935866Z fn() 2025-12-04T09:25:19.8936410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8936519Z method(*args, **kwargs) 2025-12-04T09:25:19.8937185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.8937296Z method(*args, **kwargs) 2025-12-04T09:25:19.8937802Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.8937896Z with policy(): 2025-12-04T09:25:19.8938418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.8938530Z raise RuntimeError(msg) 2025-12-04T09:25:19.8940015Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 168755200 and is now 621740032. 2025-12-04T09:25:19.8940024Z 2025-12-04T09:25:19.8940241Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.8941199Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8941214Z 2025-12-04T09:25:19.8941481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.8941661Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.8941850Z ======================= 1 failed, 5 deselected in 10.20s ======================= 2025-12-04T09:25:19.8941947Z Got exit code 1 2025-12-04T09:25:19.8942050Z Retrying single test... 2025-12-04T09:25:19.8942819Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-b1cbedcab1229122.xml 2025-12-04T09:25:19.8942984Z ============================= test session starts ============================== 2025-12-04T09:25:19.8943340Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.8943445Z cachedir: .pytest_cache 2025-12-04T09:25:19.8943959Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.8944087Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.8944192Z configfile: pytest.ini 2025-12-04T09:25:19.8944755Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.8944970Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.8946022Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.8946169Z Running 1 items in this shard 2025-12-04T09:25:19.8946174Z 2025-12-04T09:25:19.8947492Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda I1204 09:23:26.944000 33918 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 33970 2025-12-04T09:25:19.8947996Z I1204 09:23:26.945000 33918 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 33971 2025-12-04T09:25:19.8948492Z I1204 09:23:26.945000 33918 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 33972 2025-12-04T09:25:19.8949071Z I1204 09:23:26.946000 33918 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 33973 2025-12-04T09:25:19.8951229Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8951330Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8953514Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8953616Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8955741Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8955839Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8957970Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8958071Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8959609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8959751Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8961279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8961443Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8962965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8963083Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8964626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8964752Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.8966884Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8966993Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8969151Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8969258Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8971368Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8971472Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8973583Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.8973683Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.8975207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8975434Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8977269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8977438Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8979145Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8979310Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8981022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.8981177Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.8981986Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8982106Z local_shape = tensor.shape 2025-12-04T09:25:19.8982914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8983097Z tensor.shape, 2025-12-04T09:25:19.8983903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8983999Z tensor.dtype, 2025-12-04T09:25:19.8984821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8984929Z local_shape = tensor.shape 2025-12-04T09:25:19.8985737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8985835Z tensor.shape, 2025-12-04T09:25:19.8986643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8986747Z tensor.dtype, 2025-12-04T09:25:19.8987546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8987665Z local_shape = tensor.shape 2025-12-04T09:25:19.8988468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8988562Z tensor.shape, 2025-12-04T09:25:19.8989413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8989525Z tensor.dtype, 2025-12-04T09:25:19.8990243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8990365Z local_shape = tensor.shape 2025-12-04T09:25:19.8991078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8991169Z tensor.shape, 2025-12-04T09:25:19.8991884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.8991966Z tensor.dtype, 2025-12-04T09:25:19.8992353Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.8992806Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9017723Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9018301Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9019283Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9019650Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9020582Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9021407Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9022367Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9022822Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9023766Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9024183Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9025127Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9025585Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9027427Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 649003008 and is now 724500480. 2025-12-04T09:25:19.9027770Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9028449Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9029850Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.9030226Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9030924Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9031440Z E1204 09:23:35.090000 33970 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.9031866Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9032379Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9033502Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9033933Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9034780Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9035118Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9036210Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9036642Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9037521Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9037947Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9038823Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9039216Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9040094Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9040535Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9042256Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 518979584 and is now 615448576. 2025-12-04T09:25:19.9042605Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9043195Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9044533Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.9044847Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9045503Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9045991Z E1204 09:23:35.092000 33971 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.9046389Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9046870Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9047781Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9048235Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9049136Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9049481Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9050415Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9050847Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9051726Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9052153Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9053028Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9053419Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9054302Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9054739Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9056546Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 395247616 and is now 615448576. 2025-12-04T09:25:19.9057095Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9057754Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9059147Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.9059480Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9060170Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9060690Z E1204 09:23:35.092000 33973 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.9061116Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9061628Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9062601Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9063081Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9064039Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9064458Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9065399Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9065854Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9066787Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9067241Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9068185Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9068708Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9069670Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9070082Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9071711Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 527368192 and is now 615448576. 2025-12-04T09:25:19.9072066Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9072622Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9073854Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.9074153Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9074756Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9075223Z E1204 09:23:35.092000 33972 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.9075316Z FAILED [10.0662s] [100%] 2025-12-04T09:25:19.9075323Z 2025-12-04T09:25:19.9075462Z =================================== FAILURES =================================== 2025-12-04T09:25:19.9075914Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda _ 2025-12-04T09:25:19.9076017Z Traceback (most recent call last): 2025-12-04T09:25:19.9076507Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.9076609Z self._join_processes(fn) 2025-12-04T09:25:19.9077132Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.9077256Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.9077850Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.9077957Z raise RuntimeError(error) 2025-12-04T09:25:19.9078161Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:25:19.9078266Z Traceback (most recent call last): 2025-12-04T09:25:19.9078749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9078845Z getattr(self, test_name)() 2025-12-04T09:25:19.9079323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9079403Z fn() 2025-12-04T09:25:19.9079849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9079947Z method(*args, **kwargs) 2025-12-04T09:25:19.9080397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9080487Z method(*args, **kwargs) 2025-12-04T09:25:19.9080936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9081023Z with policy(): 2025-12-04T09:25:19.9081479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9081576Z raise RuntimeError(msg) 2025-12-04T09:25:19.9082827Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 527368192 and is now 615448576. 2025-12-04T09:25:19.9082866Z 2025-12-04T09:25:19.9083063Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9083948Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.9083954Z 2025-12-04T09:25:19.9084193Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9084198Z 2025-12-04T09:25:19.9084202Z 2025-12-04T09:25:19.9084399Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.9084637Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.9085474Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-b1cbedcab1229122.xml - 2025-12-04T09:25:19.9085625Z =========================== short test summary info ============================ 2025-12-04T09:25:19.9086629Z FAILED [10.0662s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:25:19.9086737Z Traceback (most recent call last): 2025-12-04T09:25:19.9087231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9087332Z getattr(self, test_name)() 2025-12-04T09:25:19.9087807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9087894Z fn() 2025-12-04T09:25:19.9088343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9088439Z method(*args, **kwargs) 2025-12-04T09:25:19.9088937Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9089032Z method(*args, **kwargs) 2025-12-04T09:25:19.9089484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9089568Z with policy(): 2025-12-04T09:25:19.9090017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9090118Z raise RuntimeError(msg) 2025-12-04T09:25:19.9091362Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 527368192 and is now 615448576. 2025-12-04T09:25:19.9091369Z 2025-12-04T09:25:19.9091567Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9092424Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.9092432Z 2025-12-04T09:25:19.9092673Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9092830Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.9092987Z ======================= 1 failed, 7 deselected in 10.09s ======================= 2025-12-04T09:25:19.9093078Z Got exit code 1 2025-12-04T09:25:19.9093171Z Retrying single test... 2025-12-04T09:25:19.9093863Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-6d24496891daae4f.xml 2025-12-04T09:25:19.9094008Z ============================= test session starts ============================== 2025-12-04T09:25:19.9094318Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.9094447Z cachedir: .pytest_cache 2025-12-04T09:25:19.9094903Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.9095011Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.9095109Z configfile: pytest.ini 2025-12-04T09:25:19.9095586Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.9095764Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.9096953Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.9097066Z Running 1 items in this shard 2025-12-04T09:25:19.9097077Z 2025-12-04T09:25:19.9098410Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda I1204 09:23:41.683000 34371 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 34423 2025-12-04T09:25:19.9098909Z I1204 09:23:41.684000 34371 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 34424 2025-12-04T09:25:19.9099405Z I1204 09:23:41.685000 34371 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 34425 2025-12-04T09:25:19.9099893Z I1204 09:23:41.686000 34371 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 34426 2025-12-04T09:25:19.9102387Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9102500Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9104886Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9105003Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9107406Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9107521Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9109877Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:118: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9110032Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9111574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9111692Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9113231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9113355Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9114869Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9114978Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9116501Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9116731Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9118871Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9118968Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9121400Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9121518Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9123893Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9124064Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9126450Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:130: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9126609Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9128322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9128488Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.9130199Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9130362Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.9132063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9132230Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.9134014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_shard_utils.py:59: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9134164Z distributed_c10d._get_pg_default_device(pg).type 2025-12-04T09:25:19.9134879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.9134983Z local_shape = tensor.shape 2025-12-04T09:25:19.9135690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.9135789Z local_shape = tensor.shape 2025-12-04T09:25:19.9136731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.9136834Z tensor.shape, 2025-12-04T09:25:19.9137648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.9137742Z tensor.shape, 2025-12-04T09:25:19.9138553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.9138654Z tensor.dtype, 2025-12-04T09:25:19.9139452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.9139578Z tensor.dtype, 2025-12-04T09:25:19.9140377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.9140492Z local_shape = tensor.shape 2025-12-04T09:25:19.9141319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.9141419Z tensor.shape, 2025-12-04T09:25:19.9142216Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:732: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.9142331Z local_shape = tensor.shape 2025-12-04T09:25:19.9143126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.9143220Z tensor.dtype, 2025-12-04T09:25:19.9144034Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.9144128Z tensor.shape, 2025-12-04T09:25:19.9144933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor. 2025-12-04T09:25:19.9145028Z tensor.dtype, 2025-12-04T09:25:19.9145453Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9145964Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9146934Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9147472Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9148555Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9149011Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9149842Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9150247Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9151087Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9151490Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9152322Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9152690Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9153514Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9153953Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9155590Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 0. CUDA driver allocated memory was 649003008 and is now 724500480. 2025-12-04T09:25:19.9155918Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9156471Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9157710Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.9158011Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9158624Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9159088Z E1204 09:23:49.751000 34423 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.9159462Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9159914Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9160776Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9161250Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9162105Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9162429Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9163258Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9163664Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9164497Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9164903Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9165732Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9166274Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9167177Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9167617Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9169365Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.9169684Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9170272Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9171584Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.9171900Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9172543Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9173033Z E1204 09:23:49.752000 34424 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.9173424Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9173908Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9175059Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9175533Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9176544Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9177082Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9178032Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9178494Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9179436Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9179888Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9180814Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9181283Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9182219Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9183053Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9184890Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 2. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.9185236Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9185864Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9187274Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.9187610Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9188405Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9188914Z E1204 09:23:49.752000 34425 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.9189436Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9190066Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9190929Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9191350Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9192209Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9192535Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9193369Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9193773Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9194601Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9195003Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9195990Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9196416Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9197295Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9197771Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9199498Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 3. CUDA driver allocated memory was 456065024 and is now 615448576. 2025-12-04T09:25:19.9199817Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9200408Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9201707Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.9202026Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9202664Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9203158Z E1204 09:23:49.753000 34426 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.9203253Z FAILED [10.0045s] [100%] 2025-12-04T09:25:19.9203259Z 2025-12-04T09:25:19.9203455Z =================================== FAILURES =================================== 2025-12-04T09:25:19.9204037Z _ TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda _ 2025-12-04T09:25:19.9204143Z Traceback (most recent call last): 2025-12-04T09:25:19.9204631Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.9204730Z self._join_processes(fn) 2025-12-04T09:25:19.9205245Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.9205375Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.9205910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.9206013Z raise RuntimeError(error) 2025-12-04T09:25:19.9206221Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.9206326Z Traceback (most recent call last): 2025-12-04T09:25:19.9206812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9206906Z getattr(self, test_name)() 2025-12-04T09:25:19.9207376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9207461Z fn() 2025-12-04T09:25:19.9207908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9208007Z method(*args, **kwargs) 2025-12-04T09:25:19.9208481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9208571Z method(*args, **kwargs) 2025-12-04T09:25:19.9209025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9209133Z with policy(): 2025-12-04T09:25:19.9209581Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9209683Z raise RuntimeError(msg) 2025-12-04T09:25:19.9210936Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.9210942Z 2025-12-04T09:25:19.9211138Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9211990Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.9211999Z 2025-12-04T09:25:19.9212241Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9212246Z 2025-12-04T09:25:19.9212250Z 2025-12-04T09:25:19.9212441Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.9212671Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.9213504Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-6d24496891daae4f.xml - 2025-12-04T09:25:19.9213651Z =========================== short test summary info ============================ 2025-12-04T09:25:19.9214653Z FAILED [10.0045s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.9214809Z Traceback (most recent call last): 2025-12-04T09:25:19.9215300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9215396Z getattr(self, test_name)() 2025-12-04T09:25:19.9215866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9215948Z fn() 2025-12-04T09:25:19.9216462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9216720Z method(*args, **kwargs) 2025-12-04T09:25:19.9217234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9217333Z method(*args, **kwargs) 2025-12-04T09:25:19.9217847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9217945Z with policy(): 2025-12-04T09:25:19.9218450Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9218566Z raise RuntimeError(msg) 2025-12-04T09:25:19.9219977Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda! Caching allocator allocated memory was 0 and is now reported as 27136 on device 1. CUDA driver allocated memory was 531562496 and is now 615448576. 2025-12-04T09:25:19.9219983Z 2025-12-04T09:25:19.9220242Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9221426Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.9221433Z 2025-12-04T09:25:19.9221704Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9221958Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.9222133Z ======================= 1 failed, 7 deselected in 10.03s ======================= 2025-12-04T09:25:19.9222237Z Got exit code 1 2025-12-04T09:25:19.9223127Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda 2025-12-04T09:25:19.9223532Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:25:19.9224296Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e815db3b6b0b67f1.xml 2025-12-04T09:25:19.9224461Z ============================= test session starts ============================== 2025-12-04T09:25:19.9224821Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.9224924Z cachedir: .pytest_cache 2025-12-04T09:25:19.9225436Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.9225559Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.9225661Z configfile: pytest.ini 2025-12-04T09:25:19.9226197Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.9226404Z collecting ... collected 8 items / 6 deselected / 2 selected 2025-12-04T09:25:19.9226542Z stepcurrent: skipping 6 already run items. 2025-12-04T09:25:19.9226659Z Running 2 items in this shard 2025-12-04T09:25:19.9226664Z 2025-12-04T09:25:19.9227899Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda I1204 09:23:56.344000 34824 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 34876 2025-12-04T09:25:19.9228409Z I1204 09:23:56.345000 34824 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 34877 2025-12-04T09:25:19.9228913Z I1204 09:23:56.345000 34824 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 34878 2025-12-04T09:25:19.9229399Z I1204 09:23:56.346000 34824 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 34879 2025-12-04T09:25:19.9231820Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9231941Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9234350Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9234488Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9236600Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9236725Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9238843Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9238943Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9240486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9240605Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9242115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9242234Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9243795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9243907Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9245441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9245551Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9245938Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9246388Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9247255Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9247678Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9248526Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9248882Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9249706Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9250145Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9250964Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9251366Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9252197Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9252565Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9253394Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9253803Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9255279Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 649003008 and is now 734986240. 2025-12-04T09:25:19.9255576Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9256243Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9257593Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9257923Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9258609Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9259126Z E1204 09:24:03.405000 34876 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.9259554Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9260057Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9261029Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9261509Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9262472Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9262877Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9263803Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9264292Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9265215Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9265669Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9266604Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9267018Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9267954Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9268410Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9270035Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:25:19.9270331Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9270938Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9272011Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9272307Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9272915Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9273556Z E1204 09:24:03.409000 34877 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.9273952Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9274423Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9275331Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9275783Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9276680Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9277059Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9277932Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9278387Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9279263Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9279689Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9280571Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9280963Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9281849Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9282279Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9283836Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:25:19.9284342Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9284900Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9285970Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9286261Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9286875Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9287324Z E1204 09:24:03.409000 34878 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.9287697Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9288142Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9288995Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9289420Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9290299Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9290628Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9291479Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9291879Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9292708Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9293113Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9293940Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9294309Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9295131Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9295546Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9297365Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 477036544 and is now 613351424. 2025-12-04T09:25:19.9297714Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9298343Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9299547Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9299882Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9300573Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9301099Z E1204 09:24:03.410000 34879 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.9301199Z FAILED [9.2135s] [ 50%] 2025-12-04T09:25:19.9301206Z 2025-12-04T09:25:19.9301359Z =================================== FAILURES =================================== 2025-12-04T09:25:19.9301700Z __ TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda ___ 2025-12-04T09:25:19.9301817Z Traceback (most recent call last): 2025-12-04T09:25:19.9302368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.9302478Z self._join_processes(fn) 2025-12-04T09:25:19.9303064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.9303233Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.9303849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.9303995Z raise RuntimeError(error) 2025-12-04T09:25:19.9304225Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.9304339Z Traceback (most recent call last): 2025-12-04T09:25:19.9304882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9304992Z getattr(self, test_name)() 2025-12-04T09:25:19.9305531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9305618Z fn() 2025-12-04T09:25:19.9306126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9306234Z method(*args, **kwargs) 2025-12-04T09:25:19.9306737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9306844Z method(*args, **kwargs) 2025-12-04T09:25:19.9307345Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9307439Z with policy(): 2025-12-04T09:25:19.9307959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9308063Z raise RuntimeError(msg) 2025-12-04T09:25:19.9309351Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:25:19.9309366Z 2025-12-04T09:25:19.9309553Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9310293Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9310301Z 2025-12-04T09:25:19.9310540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9310546Z 2025-12-04T09:25:19.9310685Z Process 2 exited with error code 10 and exception: 2025-12-04T09:25:19.9310796Z Traceback (most recent call last): 2025-12-04T09:25:19.9311278Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9311375Z getattr(self, test_name)() 2025-12-04T09:25:19.9311857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9311938Z fn() 2025-12-04T09:25:19.9312385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9312485Z method(*args, **kwargs) 2025-12-04T09:25:19.9312926Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9313021Z method(*args, **kwargs) 2025-12-04T09:25:19.9313468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9313552Z with policy(): 2025-12-04T09:25:19.9314003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9314095Z raise RuntimeError(msg) 2025-12-04T09:25:19.9315188Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:25:19.9315225Z 2025-12-04T09:25:19.9315419Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9316126Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9316131Z 2025-12-04T09:25:19.9316366Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9316371Z 2025-12-04T09:25:19.9316375Z 2025-12-04T09:25:19.9316564Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.9316799Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.9317621Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e815db3b6b0b67f1.xml - 2025-12-04T09:25:19.9317767Z =========================== short test summary info ============================ 2025-12-04T09:25:19.9318611Z FAILED [9.2135s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.9318715Z Traceback (most recent call last): 2025-12-04T09:25:19.9319205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9319300Z getattr(self, test_name)() 2025-12-04T09:25:19.9319776Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9319857Z fn() 2025-12-04T09:25:19.9320306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9320405Z method(*args, **kwargs) 2025-12-04T09:25:19.9321042Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9321300Z method(*args, **kwargs) 2025-12-04T09:25:19.9321812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9321905Z with policy(): 2025-12-04T09:25:19.9322409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9322523Z raise RuntimeError(msg) 2025-12-04T09:25:19.9323757Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 523173888 and is now 613351424. 2025-12-04T09:25:19.9323766Z 2025-12-04T09:25:19.9323984Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9324769Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9324776Z 2025-12-04T09:25:19.9325045Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9325050Z 2025-12-04T09:25:19.9325208Z Process 2 exited with error code 10 and exception: 2025-12-04T09:25:19.9325325Z Traceback (most recent call last): 2025-12-04T09:25:19.9325877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9325982Z getattr(self, test_name)() 2025-12-04T09:25:19.9326512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9326659Z fn() 2025-12-04T09:25:19.9327170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9327286Z method(*args, **kwargs) 2025-12-04T09:25:19.9327912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9328016Z method(*args, **kwargs) 2025-12-04T09:25:19.9328531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9328629Z with policy(): 2025-12-04T09:25:19.9329139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9329259Z raise RuntimeError(msg) 2025-12-04T09:25:19.9330508Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 527368192 and is now 613351424. 2025-12-04T09:25:19.9330516Z 2025-12-04T09:25:19.9330743Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9331519Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9331525Z 2025-12-04T09:25:19.9331797Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9331974Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.9332152Z ======================= 1 failed, 6 deselected in 9.23s ======================== 2025-12-04T09:25:19.9332261Z Got exit code 1 2025-12-04T09:25:19.9332370Z Retrying single test... 2025-12-04T09:25:19.9333119Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-788cdb9001b436df.xml 2025-12-04T09:25:19.9333359Z ============================= test session starts ============================== 2025-12-04T09:25:19.9333817Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.9333920Z cachedir: .pytest_cache 2025-12-04T09:25:19.9334374Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.9334479Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.9334584Z configfile: pytest.ini 2025-12-04T09:25:19.9335059Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.9335254Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.9336026Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9336126Z Running 1 items in this shard 2025-12-04T09:25:19.9336135Z 2025-12-04T09:25:19.9337467Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda I1204 09:24:10.014000 35217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 35269 2025-12-04T09:25:19.9337968Z I1204 09:24:10.015000 35217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 35270 2025-12-04T09:25:19.9338469Z I1204 09:24:10.015000 35217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 35271 2025-12-04T09:25:19.9338962Z I1204 09:24:10.016000 35217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 35272 2025-12-04T09:25:19.9341412Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9341557Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9343950Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9344071Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9346469Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9346590Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9349170Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9349334Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9350888Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9351015Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9352535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9352661Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9354187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9354302Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9355827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9355965Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9356355Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9356827Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9357690Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9358116Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9358973Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9359311Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9360137Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9360552Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9361378Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9361799Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9362670Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9363047Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9363887Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9364295Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9365785Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 619642880. 2025-12-04T09:25:19.9366092Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9366662Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9367723Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9368020Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9368661Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9369122Z E1204 09:24:17.060000 35270 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.9369540Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9369990Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9370852Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9371286Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9372144Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9372479Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9373305Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9373717Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9374541Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9374949Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9375835Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9376266Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9377347Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9377810Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9379485Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 730791936. 2025-12-04T09:25:19.9379828Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9380458Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9381665Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9382003Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9382751Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9383271Z E1204 09:24:17.061000 35269 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.9383733Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9384233Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9385203Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9385684Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9386649Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9387027Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9387964Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9388433Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9389540Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9390067Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9390948Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9391322Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9392167Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9392576Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9394064Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.9394365Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9394928Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9396005Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9396327Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9396943Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9397405Z E1204 09:24:17.065000 35272 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.9397807Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9398264Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9399127Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9399560Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9400419Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9400756Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9401581Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9401986Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9402816Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9403223Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9404107Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9404483Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9405323Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9405731Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9407207Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.9407519Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9408079Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9409156Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9409481Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9410106Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9410587Z E1204 09:24:17.065000 35271 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.9410679Z FAILED [8.9345s] [100%] 2025-12-04T09:25:19.9410684Z 2025-12-04T09:25:19.9410825Z =================================== FAILURES =================================== 2025-12-04T09:25:19.9411129Z __ TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda ___ 2025-12-04T09:25:19.9411247Z Traceback (most recent call last): 2025-12-04T09:25:19.9411728Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.9411832Z self._join_processes(fn) 2025-12-04T09:25:19.9412362Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.9412491Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.9413033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.9413143Z raise RuntimeError(error) 2025-12-04T09:25:19.9413349Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.9413466Z Traceback (most recent call last): 2025-12-04T09:25:19.9413944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9414047Z getattr(self, test_name)() 2025-12-04T09:25:19.9414525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9414607Z fn() 2025-12-04T09:25:19.9415058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9415210Z method(*args, **kwargs) 2025-12-04T09:25:19.9415665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9415765Z method(*args, **kwargs) 2025-12-04T09:25:19.9416276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9416368Z with policy(): 2025-12-04T09:25:19.9417032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9417144Z raise RuntimeError(msg) 2025-12-04T09:25:19.9418395Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 619642880. 2025-12-04T09:25:19.9418404Z 2025-12-04T09:25:19.9418623Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9419409Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9419416Z 2025-12-04T09:25:19.9419689Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9419694Z 2025-12-04T09:25:19.9419699Z 2025-12-04T09:25:19.9419915Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.9420184Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.9421277Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-788cdb9001b436df.xml - 2025-12-04T09:25:19.9421525Z =========================== short test summary info ============================ 2025-12-04T09:25:19.9422468Z FAILED [8.9345s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:25:19.9422631Z Traceback (most recent call last): 2025-12-04T09:25:19.9423193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9423304Z getattr(self, test_name)() 2025-12-04T09:25:19.9423839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9423940Z fn() 2025-12-04T09:25:19.9424452Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9424566Z method(*args, **kwargs) 2025-12-04T09:25:19.9425069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9425176Z method(*args, **kwargs) 2025-12-04T09:25:19.9425683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9425780Z with policy(): 2025-12-04T09:25:19.9426286Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9426407Z raise RuntimeError(msg) 2025-12-04T09:25:19.9427645Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 518979584 and is now 619642880. 2025-12-04T09:25:19.9427654Z 2025-12-04T09:25:19.9427878Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9428741Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9428750Z 2025-12-04T09:25:19.9429022Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9429200Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.9429376Z ======================= 1 failed, 7 deselected in 8.96s ======================== 2025-12-04T09:25:19.9429483Z Got exit code 1 2025-12-04T09:25:19.9429592Z Retrying single test... 2025-12-04T09:25:19.9430352Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9601a812ff315158.xml 2025-12-04T09:25:19.9430522Z ============================= test session starts ============================== 2025-12-04T09:25:19.9430869Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.9430990Z cachedir: .pytest_cache 2025-12-04T09:25:19.9431509Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.9431629Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.9431740Z configfile: pytest.ini 2025-12-04T09:25:19.9432270Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.9432587Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.9433482Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9433608Z Running 1 items in this shard 2025-12-04T09:25:19.9433613Z 2025-12-04T09:25:19.9434637Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda I1204 09:24:23.644000 35610 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 35662 2025-12-04T09:25:19.9435107Z I1204 09:24:23.645000 35610 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 35663 2025-12-04T09:25:19.9435550Z I1204 09:24:23.646000 35610 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 35664 2025-12-04T09:25:19.9435980Z I1204 09:24:23.646000 35610 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 35665 2025-12-04T09:25:19.9438105Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9438219Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9440327Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9440436Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9442770Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9442881Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9445116Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_hsdp_dtensor_state_dict.py:83: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9445225Z FSDP.set_state_dict_type( 2025-12-04T09:25:19.9446852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9446976Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9448589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9448743Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9450357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9450497Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9452329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_optim_utils.py:1190: UserWarning: `_get_pg_default_device` will be deprecated, it only stays for backward-compatiblity reason. If you need to find a device for object collectives, please use `_get_object_coll_device`. If you need to query the device types supported by group, please use `_device_capability(group)`. 2025-12-04T09:25:19.9452445Z device = _get_pg_default_device(group) 2025-12-04T09:25:19.9452874Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9453368Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9454323Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9454785Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9455719Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9456080Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9457324Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9457796Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9458725Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9459190Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9460116Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9460533Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9461468Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9461926Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9463589Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 2. CUDA driver allocated memory was 518979584 and is now 619642880. 2025-12-04T09:25:19.9463954Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9464589Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9465819Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9466148Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9466843Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9467357Z E1204 09:24:30.689000 35664 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.9467904Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9468386Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9469330Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9469799Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9470730Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9471090Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9472128Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9472563Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9473431Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9473856Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9474743Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9475135Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9476016Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9476446Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9478025Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 1. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.9478366Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9478956Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9480118Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9480523Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9481133Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9481587Z E1204 09:24:30.691000 35663 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.9481968Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9482412Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9483268Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9483694Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9484545Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9484875Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9485766Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9486176Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9487001Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9487402Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9488230Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9488600Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9489430Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9489834Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9491304Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 0. CUDA driver allocated memory was 640614400 and is now 722403328. 2025-12-04T09:25:19.9491630Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9492217Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9493290Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9493582Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9494193Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9494649Z E1204 09:24:30.691000 35662 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.9495022Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9495472Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9496387Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9497019Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9497977Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9498411Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9499345Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9499798Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9500734Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9501186Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9502123Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9502540Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9503479Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9503938Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9505587Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.9505962Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9506617Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9507821Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9508154Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9508955Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9509525Z E1204 09:24:30.692000 35665 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.9509613Z FAILED [8.9096s] [100%] 2025-12-04T09:25:19.9509619Z 2025-12-04T09:25:19.9509753Z =================================== FAILURES =================================== 2025-12-04T09:25:19.9510049Z __ TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda ___ 2025-12-04T09:25:19.9510161Z Traceback (most recent call last): 2025-12-04T09:25:19.9510644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.9510740Z self._join_processes(fn) 2025-12-04T09:25:19.9511265Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.9511390Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.9511979Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.9512088Z raise RuntimeError(error) 2025-12-04T09:25:19.9512292Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:25:19.9512403Z Traceback (most recent call last): 2025-12-04T09:25:19.9512877Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9512975Z getattr(self, test_name)() 2025-12-04T09:25:19.9513451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9513527Z fn() 2025-12-04T09:25:19.9513978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9514073Z method(*args, **kwargs) 2025-12-04T09:25:19.9514522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9514621Z method(*args, **kwargs) 2025-12-04T09:25:19.9515063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9515151Z with policy(): 2025-12-04T09:25:19.9515603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9515696Z raise RuntimeError(msg) 2025-12-04T09:25:19.9516795Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.9516827Z 2025-12-04T09:25:19.9517015Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9517703Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9517733Z 2025-12-04T09:25:19.9517969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9517973Z 2025-12-04T09:25:19.9517977Z 2025-12-04T09:25:19.9518168Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.9518402Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.9519229Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9601a812ff315158.xml - 2025-12-04T09:25:19.9519378Z =========================== short test summary info ============================ 2025-12-04T09:25:19.9520217Z FAILED [8.9096s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:25:19.9520324Z Traceback (most recent call last): 2025-12-04T09:25:19.9520943Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9521043Z getattr(self, test_name)() 2025-12-04T09:25:19.9521720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9521815Z fn() 2025-12-04T09:25:19.9522317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9522428Z method(*args, **kwargs) 2025-12-04T09:25:19.9522931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9523031Z method(*args, **kwargs) 2025-12-04T09:25:19.9523629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9523727Z with policy(): 2025-12-04T09:25:19.9524234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9524343Z raise RuntimeError(msg) 2025-12-04T09:25:19.9525577Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda! Caching allocator allocated memory was 0 and is now reported as 13824 on device 3. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.9525584Z 2025-12-04T09:25:19.9525802Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9526573Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9526578Z 2025-12-04T09:25:19.9526849Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9527026Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.9527203Z ======================= 1 failed, 7 deselected in 8.93s ======================== 2025-12-04T09:25:19.9527302Z Got exit code 1 2025-12-04T09:25:19.9527999Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda 2025-12-04T09:25:19.9528404Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:25:19.9529506Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c4b6ce2b260b8d4b.xml 2025-12-04T09:25:19.9529663Z ============================= test session starts ============================== 2025-12-04T09:25:19.9530022Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.9530179Z cachedir: .pytest_cache 2025-12-04T09:25:19.9530690Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.9530817Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.9530917Z configfile: pytest.ini 2025-12-04T09:25:19.9531456Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.9531659Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.9531799Z stepcurrent: skipping 7 already run items. 2025-12-04T09:25:19.9531917Z Running 1 items in this shard 2025-12-04T09:25:19.9531922Z 2025-12-04T09:25:19.9533057Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda I1204 09:24:37.314000 36003 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 36055 2025-12-04T09:25:19.9533673Z I1204 09:24:37.314000 36003 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 36056 2025-12-04T09:25:19.9534105Z I1204 09:24:37.315000 36003 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 36057 2025-12-04T09:25:19.9534536Z I1204 09:24:37.316000 36003 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 36058 2025-12-04T09:25:19.9538392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T09:25:19.9538801Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.9542600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T09:25:19.9543003Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.9546803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T09:25:19.9547319Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.9551315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T09:25:19.9551690Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.9554055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9554375Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9556728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9556992Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9559340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9559598Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9562244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9562527Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9564875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9565061Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9567416Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9567580Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9569915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9570077Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9572566Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9572717Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9573095Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9573543Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9574412Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9574840Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9575708Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9576033Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9577133Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9577679Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9578612Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9579106Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9580040Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9580458Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9581392Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9581861Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9583492Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 531562496 and is now 613351424. 2025-12-04T09:25:19.9583826Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9584458Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9585698Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9586038Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9586722Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9587240Z E1204 09:24:44.638000 36057 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.9587657Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9588152Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9589203Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9589626Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9590482Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9590805Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9591632Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9592076Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9592904Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9593336Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9594154Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9594524Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9595355Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9595764Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9597215Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 636420096 and is now 720306176. 2025-12-04T09:25:19.9597508Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9598069Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9599171Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9599476Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9600080Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9600537Z E1204 09:24:44.638000 36055 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.9600918Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9601363Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9602239Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9602662Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9603517Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9603841Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9604657Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9605097Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9605944Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9606352Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9607174Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9607546Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9608383Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9608792Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9610245Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 502202368 and is now 611254272. 2025-12-04T09:25:19.9610539Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9611103Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9612204Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9612505Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9613112Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9613566Z E1204 09:24:44.639000 36058 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.9613944Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9614391Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9615255Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9615675Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9616767Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9617147Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9618180Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9618647Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9619606Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9620065Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9621160Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9621582Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9622523Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9622981Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9624619Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 1. CUDA driver allocated memory was 527368192 and is now 611254272. 2025-12-04T09:25:19.9624951Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9625593Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9626868Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9627202Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9627891Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9628408Z E1204 09:24:44.640000 36056 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.9628517Z FAILED [9.2355s] [100%] 2025-12-04T09:25:19.9628524Z 2025-12-04T09:25:19.9628668Z =================================== FAILURES =================================== 2025-12-04T09:25:19.9629000Z ____ TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda ____ 2025-12-04T09:25:19.9629134Z Traceback (most recent call last): 2025-12-04T09:25:19.9629678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.9629797Z self._join_processes(fn) 2025-12-04T09:25:19.9630378Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.9630515Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.9631124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.9631234Z raise RuntimeError(error) 2025-12-04T09:25:19.9631516Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:25:19.9631631Z Traceback (most recent call last): 2025-12-04T09:25:19.9632174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9632328Z getattr(self, test_name)() 2025-12-04T09:25:19.9633062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9633141Z fn() 2025-12-04T09:25:19.9633596Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9633689Z method(*args, **kwargs) 2025-12-04T09:25:19.9634146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9634234Z method(*args, **kwargs) 2025-12-04T09:25:19.9634682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9634775Z with policy(): 2025-12-04T09:25:19.9635226Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9635323Z raise RuntimeError(msg) 2025-12-04T09:25:19.9636398Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 502202368 and is now 611254272. 2025-12-04T09:25:19.9636404Z 2025-12-04T09:25:19.9636593Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9637274Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9637281Z 2025-12-04T09:25:19.9637513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9637518Z 2025-12-04T09:25:19.9637522Z 2025-12-04T09:25:19.9637724Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.9638019Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.9638847Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c4b6ce2b260b8d4b.xml - 2025-12-04T09:25:19.9639001Z =========================== short test summary info ============================ 2025-12-04T09:25:19.9639816Z FAILED [9.2355s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:25:19.9639931Z Traceback (most recent call last): 2025-12-04T09:25:19.9640413Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9640510Z getattr(self, test_name)() 2025-12-04T09:25:19.9640994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9641076Z fn() 2025-12-04T09:25:19.9641528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9641618Z method(*args, **kwargs) 2025-12-04T09:25:19.9642061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9642155Z method(*args, **kwargs) 2025-12-04T09:25:19.9642599Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9642710Z with policy(): 2025-12-04T09:25:19.9643161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9643255Z raise RuntimeError(msg) 2025-12-04T09:25:19.9644338Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 502202368 and is now 611254272. 2025-12-04T09:25:19.9644370Z 2025-12-04T09:25:19.9644561Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9645235Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9645246Z 2025-12-04T09:25:19.9645478Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9645633Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.9645792Z ======================= 1 failed, 7 deselected in 9.26s ======================== 2025-12-04T09:25:19.9645875Z Got exit code 1 2025-12-04T09:25:19.9645965Z Retrying single test... 2025-12-04T09:25:19.9646642Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-490a12d48ec816b9.xml 2025-12-04T09:25:19.9646783Z ============================= test session starts ============================== 2025-12-04T09:25:19.9647095Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.9647185Z cachedir: .pytest_cache 2025-12-04T09:25:19.9647644Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.9647752Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.9647846Z configfile: pytest.ini 2025-12-04T09:25:19.9648316Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.9648504Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.9649300Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9649404Z Running 1 items in this shard 2025-12-04T09:25:19.9649408Z 2025-12-04T09:25:19.9650402Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda I1204 09:24:51.274000 36396 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 36448 2025-12-04T09:25:19.9650842Z I1204 09:24:51.275000 36396 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 36449 2025-12-04T09:25:19.9651287Z I1204 09:24:51.276000 36396 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 36450 2025-12-04T09:25:19.9651721Z I1204 09:24:51.277000 36396 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 36451 2025-12-04T09:25:19.9655112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T09:25:19.9655487Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.9659405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T09:25:19.9659842Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.9663647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T09:25:19.9664046Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.9667879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T09:25:19.9668271Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.9670778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9671037Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9673472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9673769Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9675992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9676242Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9678459Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9678701Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9680922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9681126Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9683362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9683513Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9685721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9685872Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9688095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9688265Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9688652Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9689125Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9689992Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9690414Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9691275Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9691603Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9692441Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9692848Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9693670Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9694078Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9694963Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9695340Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9696220Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9696809Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9698445Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 636420096 and is now 720306176. 2025-12-04T09:25:19.9698785Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9699422Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9700606Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9700942Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9701633Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9702190Z E1204 09:24:58.584000 36448 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.9702618Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9703144Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9704118Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9704594Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9705565Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9705931Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9706860Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9707319Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9708245Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9708715Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9709714Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9710096Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9710923Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9711329Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9712780Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:25:19.9713079Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9713641Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9714893Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9715210Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9715882Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9716367Z E1204 09:24:58.585000 36451 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.9716796Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9717260Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9718177Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9718622Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9719536Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9719883Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9720941Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9721561Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9722486Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9722946Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9723964Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9724383Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9725326Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9725783Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9727426Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 1. CUDA driver allocated memory was 527368192 and is now 611254272. 2025-12-04T09:25:19.9727760Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9728394Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9729584Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9729922Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9730649Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9731168Z E1204 09:24:58.585000 36449 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.9731629Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9732126Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9733106Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9733692Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9734631Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9734994Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9735891Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9736413Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9737505Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9737970Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9738961Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9739378Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9740316Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9740774Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9742425Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 527368192 and is now 611254272. 2025-12-04T09:25:19.9742760Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9743392Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9744573Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9744931Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9745626Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9746149Z E1204 09:24:58.586000 36450 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.9746301Z FAILED [9.2340s] [100%] 2025-12-04T09:25:19.9746307Z 2025-12-04T09:25:19.9746455Z =================================== FAILURES =================================== 2025-12-04T09:25:19.9746784Z ____ TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda ____ 2025-12-04T09:25:19.9746911Z Traceback (most recent call last): 2025-12-04T09:25:19.9747458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.9747576Z self._join_processes(fn) 2025-12-04T09:25:19.9748158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.9748299Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.9748992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.9749099Z raise RuntimeError(error) 2025-12-04T09:25:19.9749315Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:25:19.9749434Z Traceback (most recent call last): 2025-12-04T09:25:19.9749939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9750045Z getattr(self, test_name)() 2025-12-04T09:25:19.9750544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9750626Z fn() 2025-12-04T09:25:19.9751104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9751204Z method(*args, **kwargs) 2025-12-04T09:25:19.9751728Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9751839Z method(*args, **kwargs) 2025-12-04T09:25:19.9752311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9752407Z with policy(): 2025-12-04T09:25:19.9752885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9752984Z raise RuntimeError(msg) 2025-12-04T09:25:19.9754127Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 527368192 and is now 611254272. 2025-12-04T09:25:19.9754135Z 2025-12-04T09:25:19.9754334Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9755060Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9755067Z 2025-12-04T09:25:19.9755416Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9755421Z 2025-12-04T09:25:19.9755425Z 2025-12-04T09:25:19.9755623Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.9755852Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.9756676Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-490a12d48ec816b9.xml - 2025-12-04T09:25:19.9756861Z =========================== short test summary info ============================ 2025-12-04T09:25:19.9757687Z FAILED [9.2340s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:25:19.9757823Z Traceback (most recent call last): 2025-12-04T09:25:19.9758308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9758406Z getattr(self, test_name)() 2025-12-04T09:25:19.9758886Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9758964Z fn() 2025-12-04T09:25:19.9759407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9759507Z method(*args, **kwargs) 2025-12-04T09:25:19.9759952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9760057Z method(*args, **kwargs) 2025-12-04T09:25:19.9760500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9760582Z with policy(): 2025-12-04T09:25:19.9761037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9761131Z raise RuntimeError(msg) 2025-12-04T09:25:19.9762205Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 527368192 and is now 611254272. 2025-12-04T09:25:19.9762213Z 2025-12-04T09:25:19.9762399Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9763182Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9763189Z 2025-12-04T09:25:19.9763429Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9763584Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.9763745Z ======================= 1 failed, 7 deselected in 9.26s ======================== 2025-12-04T09:25:19.9763833Z Got exit code 1 2025-12-04T09:25:19.9763924Z Retrying single test... 2025-12-04T09:25:19.9764599Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e2f9fc6fa3a79028.xml 2025-12-04T09:25:19.9764742Z ============================= test session starts ============================== 2025-12-04T09:25:19.9765048Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.9765150Z cachedir: .pytest_cache 2025-12-04T09:25:19.9765605Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.9765718Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.9765813Z configfile: pytest.ini 2025-12-04T09:25:19.9766284Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.9766472Z collecting ... collected 8 items / 7 deselected / 1 selected 2025-12-04T09:25:19.9767220Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9767350Z Running 1 items in this shard 2025-12-04T09:25:19.9767355Z 2025-12-04T09:25:19.9768356Z distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda I1204 09:25:05.174000 36789 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 36841 2025-12-04T09:25:19.9768823Z I1204 09:25:05.174000 36789 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 36842 2025-12-04T09:25:19.9769265Z I1204 09:25:05.175000 36789 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 36843 2025-12-04T09:25:19.9769700Z I1204 09:25:05.176000 36789 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 36844 2025-12-04T09:25:19.9773098Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T09:25:19.9773450Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.9777156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T09:25:19.9777557Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.9781359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T09:25:19.9781752Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.9785539Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: c10d::allreduce_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:76.) 2025-12-04T09:25:19.9785986Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:25:19.9788589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9788858Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9791185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9791430Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9793689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9793933Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9796160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:822: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9796404Z prev_state_dict_settings = FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9798620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9798774Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9800985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9801197Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9803417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9803566Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9805796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:829: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . 2025-12-04T09:25:19.9805942Z FullyShardedDataParallel.set_state_dict_type( 2025-12-04T09:25:19.9806323Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9806768Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9807649Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9808119Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9808984Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9809308Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9810137Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9810553Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9811383Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9811795Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9812618Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9812995Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9813817Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9814267Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9815745Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 2. CUDA driver allocated memory was 527368192 and is now 617545728. 2025-12-04T09:25:19.9816040Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9816840Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9818030Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9818375Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9819060Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9819574Z E1204 09:25:12.425000 36843 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:25:19.9820002Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9820501Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9821664Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9822247Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9823227Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9823596Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9824528Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9824995Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9825930Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9826388Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9827318Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9827733Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9828710Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9829172Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9830844Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 1. CUDA driver allocated memory was 531562496 and is now 611254272. 2025-12-04T09:25:19.9831175Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9831808Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9833096Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9833415Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9834058Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9834541Z E1204 09:25:12.427000 36842 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:25:19.9834942Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9835410Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9836386Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9836838Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9837738Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9838090Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9838963Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9839396Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9840273Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9840707Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9841579Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9841967Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9842879Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9843312Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9844872Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 649003008 and is now 720306176. 2025-12-04T09:25:19.9845183Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9845778Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9846894Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9847205Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9847858Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9848339Z E1204 09:25:12.427000 36841 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:25:19.9848734Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:25:19.9849204Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:25:19.9850176Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9850635Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:25:19.9851534Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9851885Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:25:19.9852757Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9853196Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9854169Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9854572Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:25:19.9855399Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9855767Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:25:19.9856895Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9857399Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:25:19.9859034Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 3. CUDA driver allocated memory was 456065024 and is now 611254272. 2025-12-04T09:25:19.9859370Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9860004Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9861202Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9861536Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:25:19.9862235Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9862749Z E1204 09:25:12.429000 36844 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:25:19.9862855Z FAILED [9.1029s] [100%] 2025-12-04T09:25:19.9862863Z 2025-12-04T09:25:19.9863010Z =================================== FAILURES =================================== 2025-12-04T09:25:19.9863341Z ____ TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda ____ 2025-12-04T09:25:19.9863466Z Traceback (most recent call last): 2025-12-04T09:25:19.9864065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:25:19.9864178Z self._join_processes(fn) 2025-12-04T09:25:19.9864767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:25:19.9864903Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:25:19.9865514Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:25:19.9865625Z raise RuntimeError(error) 2025-12-04T09:25:19.9865859Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:25:19.9865984Z Traceback (most recent call last): 2025-12-04T09:25:19.9866521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9866631Z getattr(self, test_name)() 2025-12-04T09:25:19.9867176Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9867265Z fn() 2025-12-04T09:25:19.9867776Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9867879Z method(*args, **kwargs) 2025-12-04T09:25:19.9868380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9868596Z method(*args, **kwargs) 2025-12-04T09:25:19.9869183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9869307Z with policy(): 2025-12-04T09:25:19.9869781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9869886Z raise RuntimeError(msg) 2025-12-04T09:25:19.9871063Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 649003008 and is now 720306176. 2025-12-04T09:25:19.9871069Z 2025-12-04T09:25:19.9871271Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9871995Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9872003Z 2025-12-04T09:25:19.9872247Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9872252Z 2025-12-04T09:25:19.9872256Z 2025-12-04T09:25:19.9872459Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:25:19.9872716Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:25:19.9873590Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e2f9fc6fa3a79028.xml - 2025-12-04T09:25:19.9873753Z =========================== short test summary info ============================ 2025-12-04T09:25:19.9874616Z FAILED [9.1029s] distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:25:19.9874729Z Traceback (most recent call last): 2025-12-04T09:25:19.9875252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:25:19.9875353Z getattr(self, test_name)() 2025-12-04T09:25:19.9875907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:25:19.9875990Z fn() 2025-12-04T09:25:19.9876465Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9876565Z method(*args, **kwargs) 2025-12-04T09:25:19.9877036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:25:19.9877133Z method(*args, **kwargs) 2025-12-04T09:25:19.9877606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:25:19.9877692Z with policy(): 2025-12-04T09:25:19.9878170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:25:19.9878269Z raise RuntimeError(msg) 2025-12-04T09:25:19.9879485Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda! Caching allocator allocated memory was 0 and is now reported as 512 on device 0. CUDA driver allocated memory was 649003008 and is now 720306176. 2025-12-04T09:25:19.9879499Z 2025-12-04T09:25:19.9879686Z To execute this test, run the following from the base repo dir: 2025-12-04T09:25:19.9880358Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_hsdp_dtensor_state_dict.py TestHSDPWithDeviceMeshAndDTensorCUDA.test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9880363Z 2025-12-04T09:25:19.9880602Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:25:19.9880757Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:25:19.9880939Z ======================= 1 failed, 7 deselected in 9.12s ======================== 2025-12-04T09:25:19.9881028Z Got exit code 1 2025-12-04T09:25:19.9881647Z FAILED CONSISTENTLY: test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda 2025-12-04T09:25:19.9882043Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:25:19.9882705Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-384ab9a5685ff7be.xml 2025-12-04T09:25:19.9882847Z ============================= test session starts ============================== 2025-12-04T09:25:19.9883161Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:25:19.9883253Z cachedir: .pytest_cache 2025-12-04T09:25:19.9883720Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:25:19.9883825Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:25:19.9883919Z configfile: pytest.ini 2025-12-04T09:25:19.9884404Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:25:19.9884587Z collecting ... collected 8 items / 8 deselected / 0 selected 2025-12-04T09:25:19.9884711Z stepcurrent: skipping 8 already run items. 2025-12-04T09:25:19.9884817Z Running 0 items in this shard 2025-12-04T09:25:19.9884821Z 2025-12-04T09:25:19.9885646Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-384ab9a5685ff7be.xml - 2025-12-04T09:25:19.9885794Z ============================ 8 deselected in 0.01s ============================= 2025-12-04T09:25:19.9891735Z The following tests failed consistently: ['test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_False_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_model_load_state_dict_offload_to_cpu_True_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_False_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_optim_load_state_dict_offload_to_cpu_True_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_False_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_dtensor_sharded_tensor_state_dict_identical_offload_to_cpu_True_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_hsdp_init_with_device_mesh_cuda', 'test/distributed/fsdp/test_hsdp_dtensor_state_dict.py::TestHSDPWithDeviceMeshAndDTensorCUDA::test_root_module_is_not_FSDP_cuda'] 2025-12-04T09:25:19.9891758Z 2025-12-04T09:25:19.9892411Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 (test/test-reports/distributed.fsdp.test_hsdp_dtensor_state_dict_1.1_8591eb8b13b136e6_.log) 2025-12-04T09:25:19.9892415Z 2025-12-04T09:25:19.9892827Z Finished distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 ... [2025-12-04 09:25:19.633142][1951.241055853], took 5.73min 2025-12-04T09:25:19.9893708Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a78dec0d79621f36.xml 2025-12-04T09:25:19.9894585Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9a14ac4718e66e44.xml 2025-12-04T09:25:19.9895497Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7d115d367e840460.xml 2025-12-04T09:25:19.9896497Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-724e16d7d24ec18b.xml 2025-12-04T09:25:19.9897651Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-1c81c8f34feb9c16.xml 2025-12-04T09:25:19.9898632Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a326f09bb7c5e616.xml 2025-12-04T09:25:19.9899614Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7096ae518bc839e.xml 2025-12-04T09:25:19.9900599Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-dbe06a751e4355d9.xml 2025-12-04T09:25:19.9901596Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7f21dedd43754e1.xml 2025-12-04T09:25:19.9902571Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7dbc99509eb0f4ce.xml 2025-12-04T09:25:19.9903555Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-5b4af92028672eb6.xml 2025-12-04T09:25:19.9904616Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c67b11ef8bde4252.xml 2025-12-04T09:25:20.0059734Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c057f5798619892b.xml 2025-12-04T09:25:20.0335728Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-aae1a2ba6806c0ef.xml 2025-12-04T09:25:20.0875973Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c34ce2d8050066e8.xml 2025-12-04T09:25:20.1203468Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-fde5b3ce12e5a98a.xml 2025-12-04T09:25:20.1459818Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-b1cbedcab1229122.xml 2025-12-04T09:25:20.1792355Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-6d24496891daae4f.xml 2025-12-04T09:25:20.2053467Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e815db3b6b0b67f1.xml 2025-12-04T09:25:20.2292600Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-788cdb9001b436df.xml 2025-12-04T09:25:20.2590524Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9601a812ff315158.xml 2025-12-04T09:25:20.2843040Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c4b6ce2b260b8d4b.xml 2025-12-04T09:25:20.3161955Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-490a12d48ec816b9.xml 2025-12-04T09:25:20.3415749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e2f9fc6fa3a79028.xml 2025-12-04T09:25:20.3633753Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-384ab9a5685ff7be.xml 2025-12-04T09:25:20.6244043Z Uploading logs for 57116084904 to S3 2025-12-04T09:25:20.6739000Z Uploading artifacts took 0.29 seconds 2025-12-04T09:25:20.6739514Z distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 failed! 2025-12-04T09:25:20.6747954Z Running distributed/fsdp/test_fsdp_clip_grad_norm 1/1 ... [2025-12-04 09:25:20.674290][1952.282206742] 2025-12-04T09:25:20.6748621Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:25:20.6749920Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_clip_grad_norm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:25:20.674671] 2025-12-04T09:28:44.9621248Z 2025-12-04T09:28:44.9622618Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_clip_grad_norm 1/1 (test/test-reports/distributed.fsdp.test_fsdp_clip_grad_norm_1.1_4959fae61140b3a8_.log) 2025-12-04T09:28:44.9624232Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a06a4188d644524d.xml 2025-12-04T09:28:44.9625272Z ============================= test session starts ============================== 2025-12-04T09:28:44.9625938Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:44.9626670Z cachedir: .pytest_cache 2025-12-04T09:28:44.9627363Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:44.9628118Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:44.9628464Z configfile: pytest.ini 2025-12-04T09:28:44.9629172Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:44.9630069Z collecting ... collected 4 items 2025-12-04T09:28:44.9630472Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T09:28:44.9632716Z Running 4 items in this shard: test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda, test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda, test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda, test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda 2025-12-04T09:28:44.9634750Z 2025-12-04T09:28:44.9635663Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda I1204 09:25:24.094000 37239 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 37291 2025-12-04T09:28:44.9637323Z I1204 09:25:24.095000 37239 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 37292 2025-12-04T09:28:44.9638450Z I1204 09:25:24.096000 37239 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 37293 2025-12-04T09:28:44.9639648Z I1204 09:25:24.097000 37239 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 37294 2025-12-04T09:28:44.9641470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:28:44.9642953Z self.encoder = TransformerEncoder( 2025-12-04T09:28:44.9644413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:28:44.9645880Z self.encoder = TransformerEncoder( 2025-12-04T09:28:44.9647326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:28:44.9648765Z self.encoder = TransformerEncoder( 2025-12-04T09:28:44.9650201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:28:44.9651659Z self.encoder = TransformerEncoder( 2025-12-04T09:28:44.9653628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:44.9655602Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:44.9657854Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:44.9659880Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:44.9661900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:44.9663913Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:44.9665942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:44.9667994Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:44.9669377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:28:44.9670591Z return func(*args, **kwargs) 2025-12-04T09:28:44.9671776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9672961Z return fsdp_fn(module, **kwargs) 2025-12-04T09:28:44.9674105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9675285Z return fsdp_fn(module, **kwargs) 2025-12-04T09:28:44.9676446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9677628Z return fsdp_fn(module, **kwargs) 2025-12-04T09:28:44.9678777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9679953Z return fsdp_fn(module, **kwargs) 2025-12-04T09:28:44.9681137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9682338Z fsdp_model = FSDP( 2025-12-04T09:28:44.9683458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9684653Z fsdp_model = FSDP( 2025-12-04T09:28:44.9685856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9687054Z fsdp_model = FSDP( 2025-12-04T09:28:44.9688169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9689359Z fsdp_model = FSDP( 2025-12-04T09:28:44.9693917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:44.9699130Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:44.9704182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:44.9709347Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:44.9714250Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:44.9719084Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:44.9724519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:44.9729526Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:44.9731026Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9732250Z fsdp_model.transformer.encoder = FSDP( 2025-12-04T09:28:44.9733568Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9734761Z fsdp_model.transformer.encoder = FSDP( 2025-12-04T09:28:44.9735954Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9737515Z fsdp_model.transformer.encoder = FSDP( 2025-12-04T09:28:44.9738748Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9740025Z fsdp_model.transformer.encoder = FSDP( 2025-12-04T09:28:44.9740762Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:44.9741898Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:44.9743588Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:44.9745256Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:44.9746909Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:44.9748545Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:44.9750196Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9751699Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:44.9753203Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9754758Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:44.9756258Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:44.9757729Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:44.9759345Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:44.9761043Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:44.9763507Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 0. CUDA driver allocated memory was 714014720 and is now 804192256. 2025-12-04T09:28:44.9765646Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:44.9766769Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:44.9768536Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:44.9770047Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:44.9771244Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:44.9772629Z [rank0]:E1204 09:25:43.079000 37291 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:44.9773821Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:44.9774880Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:44.9776552Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:44.9778390Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:44.9780024Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:44.9781565Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:44.9783072Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9784679Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:44.9786326Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9787927Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:44.9789668Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:44.9791056Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:44.9792437Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:44.9793853Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:44.9795782Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 1. CUDA driver allocated memory was 604962816 and is now 695140352. 2025-12-04T09:28:44.9797579Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:44.9798619Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:44.9800260Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:44.9801607Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:44.9802730Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:44.9803981Z [rank1]:E1204 09:25:43.082000 37292 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:44.9804998Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:44.9806006Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:44.9807501Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:44.9808966Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:44.9810433Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:44.9811792Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:44.9813130Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9814620Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:44.9816043Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9817873Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:44.9819469Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:44.9821198Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:44.9822770Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:44.9824375Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:44.9826545Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 2. CUDA driver allocated memory was 602865664 and is now 695140352. 2025-12-04T09:28:44.9828580Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:44.9829807Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:44.9831646Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:44.9833404Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:44.9834560Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:44.9836068Z [rank2]:E1204 09:25:43.082000 37293 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:44.9837172Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:44.9838267Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:44.9839902Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:44.9841500Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:44.9843081Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:44.9844575Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:44.9846130Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9847788Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:44.9849301Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9850791Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:44.9852292Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:44.9853764Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:44.9855240Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:44.9857193Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:44.9859341Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 3. CUDA driver allocated memory was 491716608 and is now 695140352. 2025-12-04T09:28:44.9862917Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:44.9864102Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:44.9865964Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:44.9867486Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:44.9868707Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:44.9870158Z [rank3]:E1204 09:25:43.083000 37294 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:44.9870903Z dist init r=0, world=4 2025-12-04T09:28:44.9871173Z dist init r=1, world=4 2025-12-04T09:28:44.9871426Z dist init r=3, world=4 2025-12-04T09:28:44.9871692Z dist init r=2, world=4 2025-12-04T09:28:44.9872960Z [rank0]:[W1204 09:25:43.138329303 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:28:44.9874268Z FAILED [21.5235s] [ 25%] 2025-12-04T09:28:44.9874456Z 2025-12-04T09:28:44.9874603Z =================================== FAILURES =================================== 2025-12-04T09:28:44.9875134Z __________________ TestClipGradNormCUDA.test_ddp_parity_cuda ___________________ 2025-12-04T09:28:44.9875635Z Traceback (most recent call last): 2025-12-04T09:28:44.9876376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:44.9877130Z self._join_processes(fn) 2025-12-04T09:28:44.9877957Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:44.9878787Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:44.9879612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:44.9880427Z raise RuntimeError(error) 2025-12-04T09:28:44.9880852Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:44.9881306Z Traceback (most recent call last): 2025-12-04T09:28:44.9882043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:44.9882796Z getattr(self, test_name)() 2025-12-04T09:28:44.9883509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:44.9884227Z fn() 2025-12-04T09:28:44.9884843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9885560Z method(*args, **kwargs) 2025-12-04T09:28:44.9886227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9886945Z method(*args, **kwargs) 2025-12-04T09:28:44.9887695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:44.9888532Z with policy(): 2025-12-04T09:28:44.9889170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:44.9889928Z raise RuntimeError(msg) 2025-12-04T09:28:44.9891149Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 0. CUDA driver allocated memory was 714014720 and is now 804192256. 2025-12-04T09:28:44.9892305Z 2025-12-04T09:28:44.9892521Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:44.9893372Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:44.9894038Z 2025-12-04T09:28:44.9894288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:44.9894677Z 2025-12-04T09:28:44.9894835Z Process 3 exited with error code 10 and exception: 2025-12-04T09:28:44.9895233Z Traceback (most recent call last): 2025-12-04T09:28:44.9895966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:44.9896988Z getattr(self, test_name)() 2025-12-04T09:28:44.9897750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:44.9898517Z fn() 2025-12-04T09:28:44.9899169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9899939Z method(*args, **kwargs) 2025-12-04T09:28:44.9900658Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9901406Z method(*args, **kwargs) 2025-12-04T09:28:44.9902116Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:44.9902870Z with policy(): 2025-12-04T09:28:44.9903537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:44.9904307Z raise RuntimeError(msg) 2025-12-04T09:28:44.9905653Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 3. CUDA driver allocated memory was 491716608 and is now 695140352. 2025-12-04T09:28:44.9906869Z 2025-12-04T09:28:44.9907093Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:44.9908021Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:44.9908820Z 2025-12-04T09:28:44.9909197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:44.9909574Z 2025-12-04T09:28:44.9909578Z 2025-12-04T09:28:44.9909793Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:44.9910394Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:44.9911698Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a06a4188d644524d.xml - 2025-12-04T09:28:44.9912773Z =========================== short test summary info ============================ 2025-12-04T09:28:44.9913714Z FAILED [21.5235s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:44.9914601Z Traceback (most recent call last): 2025-12-04T09:28:44.9915313Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:44.9916031Z getattr(self, test_name)() 2025-12-04T09:28:44.9916695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:44.9917427Z fn() 2025-12-04T09:28:44.9918012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9918686Z method(*args, **kwargs) 2025-12-04T09:28:44.9919331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9920041Z method(*args, **kwargs) 2025-12-04T09:28:44.9920688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:44.9921692Z with policy(): 2025-12-04T09:28:44.9922382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:44.9923154Z raise RuntimeError(msg) 2025-12-04T09:28:44.9924429Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 0. CUDA driver allocated memory was 714014720 and is now 804192256. 2025-12-04T09:28:44.9925644Z 2025-12-04T09:28:44.9925861Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:44.9926785Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:44.9927490Z 2025-12-04T09:28:44.9927766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:44.9928166Z 2025-12-04T09:28:44.9928344Z Process 3 exited with error code 10 and exception: 2025-12-04T09:28:44.9928754Z Traceback (most recent call last): 2025-12-04T09:28:44.9929541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:44.9930347Z getattr(self, test_name)() 2025-12-04T09:28:44.9931092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:44.9931870Z fn() 2025-12-04T09:28:44.9932525Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9933395Z method(*args, **kwargs) 2025-12-04T09:28:44.9934262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:44.9934943Z method(*args, **kwargs) 2025-12-04T09:28:44.9935579Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:44.9936297Z with policy(): 2025-12-04T09:28:44.9937123Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:44.9937890Z raise RuntimeError(msg) 2025-12-04T09:28:44.9939176Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 3. CUDA driver allocated memory was 491716608 and is now 695140352. 2025-12-04T09:28:44.9940376Z 2025-12-04T09:28:44.9940594Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:44.9941516Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:44.9942222Z 2025-12-04T09:28:44.9942487Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:44.9943081Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:44.9943548Z ============================== 1 failed in 21.55s ============================== 2025-12-04T09:28:44.9943948Z Got exit code 1 2025-12-04T09:28:44.9944218Z Retrying single test... 2025-12-04T09:28:44.9945171Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-03186403898f3bbb.xml 2025-12-04T09:28:44.9946173Z ============================= test session starts ============================== 2025-12-04T09:28:44.9946838Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:44.9947481Z cachedir: .pytest_cache 2025-12-04T09:28:44.9948179Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:44.9949052Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:44.9949366Z configfile: pytest.ini 2025-12-04T09:28:44.9950013Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:44.9950791Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:28:44.9951682Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda 2025-12-04T09:28:44.9952476Z Running 1 items in this shard 2025-12-04T09:28:44.9952663Z 2025-12-04T09:28:44.9953715Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda I1204 09:25:50.084000 37668 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 37720 2025-12-04T09:28:44.9955193Z I1204 09:25:50.084000 37668 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 37721 2025-12-04T09:28:44.9956432Z I1204 09:25:50.085000 37668 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 37722 2025-12-04T09:28:44.9957538Z I1204 09:25:50.086000 37668 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 37723 2025-12-04T09:28:44.9959375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:28:44.9960876Z self.encoder = TransformerEncoder( 2025-12-04T09:28:44.9962375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:28:44.9963832Z self.encoder = TransformerEncoder( 2025-12-04T09:28:44.9965274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:28:44.9966735Z self.encoder = TransformerEncoder( 2025-12-04T09:28:44.9968174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:28:44.9969627Z self.encoder = TransformerEncoder( 2025-12-04T09:28:44.9971650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:44.9973554Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:44.9975460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:44.9977676Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:44.9979727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:44.9981752Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:44.9983769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:44.9985795Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:44.9987107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:28:44.9988347Z return func(*args, **kwargs) 2025-12-04T09:28:44.9989665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9990760Z return fsdp_fn(module, **kwargs) 2025-12-04T09:28:44.9991832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9992910Z return fsdp_fn(module, **kwargs) 2025-12-04T09:28:44.9994026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9995112Z return fsdp_fn(module, **kwargs) 2025-12-04T09:28:44.9996167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9997231Z return fsdp_fn(module, **kwargs) 2025-12-04T09:28:44.9998322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:44.9999427Z fsdp_model = FSDP( 2025-12-04T09:28:45.0000486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0001570Z fsdp_model = FSDP( 2025-12-04T09:28:45.0002608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0003706Z fsdp_model = FSDP( 2025-12-04T09:28:45.0004743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0005850Z fsdp_model = FSDP( 2025-12-04T09:28:45.0009988Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:45.0014454Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:45.0019449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:45.0024793Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:45.0029848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:45.0034723Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:45.0039195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:45.0043705Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:45.0045050Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0046138Z fsdp_model.transformer.encoder = FSDP( 2025-12-04T09:28:45.0047226Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0048321Z fsdp_model.transformer.encoder = FSDP( 2025-12-04T09:28:45.0049419Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0050497Z fsdp_model.transformer.encoder = FSDP( 2025-12-04T09:28:45.0051581Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0052668Z fsdp_model.transformer.encoder = FSDP( 2025-12-04T09:28:45.0053318Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0054318Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0055881Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0057685Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0059341Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0060878Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0062380Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0063994Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0065596Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0067190Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0068776Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0070299Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0071691Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0073155Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0075083Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 0. CUDA driver allocated memory was 714014720 and is now 804192256. 2025-12-04T09:28:45.0076893Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0077924Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0079543Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:45.0080903Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0082007Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0083247Z [rank0]:E1204 09:26:09.052000 37720 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:45.0084273Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0085288Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0086838Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0088312Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0089764Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0091126Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0092470Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0093887Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0095292Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0096955Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0098633Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0100223Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0101791Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0103419Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0105581Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 2. CUDA driver allocated memory was 602865664 and is now 695140352. 2025-12-04T09:28:45.0107615Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0108894Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0110643Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:45.0111984Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0113080Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0114331Z [rank2]:E1204 09:26:09.053000 37722 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:45.0115348Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0116401Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0117894Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0119370Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0120959Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0122654Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0124160Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0125763Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0127369Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0128968Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0130639Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0132189Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0133881Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0135322Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0137572Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 3. CUDA driver allocated memory was 489619456 and is now 695140352. 2025-12-04T09:28:45.0139609Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0140770Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0142596Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:45.0144135Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0145351Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0146761Z [rank3]:E1204 09:26:09.054000 37723 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:45.0147987Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0149228Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0150821Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0152367Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0154004Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0155375Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0156724Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0158150Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0159560Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0161008Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0162440Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0163856Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0165243Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0166674Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0168617Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 1. CUDA driver allocated memory was 604962816 and is now 695140352. 2025-12-04T09:28:45.0170421Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0171468Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0173073Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:45.0174423Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0175516Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0177105Z [rank1]:E1204 09:26:09.055000 37721 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:45.0177906Z dist init r=3, world=4 2025-12-04T09:28:45.0178180Z dist init r=1, world=4 2025-12-04T09:28:45.0178461Z dist init r=2, world=4 2025-12-04T09:28:45.0178741Z dist init r=0, world=4 2025-12-04T09:28:45.0180068Z [rank0]:[W1204 09:26:09.128876599 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:28:45.0181461Z FAILED [21.5254s] [100%] 2025-12-04T09:28:45.0181660Z 2025-12-04T09:28:45.0181813Z =================================== FAILURES =================================== 2025-12-04T09:28:45.0182375Z __________________ TestClipGradNormCUDA.test_ddp_parity_cuda ___________________ 2025-12-04T09:28:45.0182896Z Traceback (most recent call last): 2025-12-04T09:28:45.0183692Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:45.0184495Z self._join_processes(fn) 2025-12-04T09:28:45.0185283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:45.0186157Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:45.0187043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:45.0187909Z raise RuntimeError(error) 2025-12-04T09:28:45.0188394Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:45.0188990Z Traceback (most recent call last): 2025-12-04T09:28:45.0189688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0190404Z getattr(self, test_name)() 2025-12-04T09:28:45.0191100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0191784Z fn() 2025-12-04T09:28:45.0192366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0193031Z method(*args, **kwargs) 2025-12-04T09:28:45.0193668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0194348Z method(*args, **kwargs) 2025-12-04T09:28:45.0194985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0195640Z with policy(): 2025-12-04T09:28:45.0196251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0196938Z raise RuntimeError(msg) 2025-12-04T09:28:45.0198076Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 1. CUDA driver allocated memory was 604962816 and is now 695140352. 2025-12-04T09:28:45.0199154Z 2025-12-04T09:28:45.0199350Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0200170Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:45.0200791Z 2025-12-04T09:28:45.0201043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0201400Z 2025-12-04T09:28:45.0201559Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:45.0201923Z Traceback (most recent call last): 2025-12-04T09:28:45.0202677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0203396Z getattr(self, test_name)() 2025-12-04T09:28:45.0204059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0204752Z fn() 2025-12-04T09:28:45.0205331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0206008Z method(*args, **kwargs) 2025-12-04T09:28:45.0206633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0207315Z method(*args, **kwargs) 2025-12-04T09:28:45.0207950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0208606Z with policy(): 2025-12-04T09:28:45.0209218Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0210356Z raise RuntimeError(msg) 2025-12-04T09:28:45.0211499Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 2. CUDA driver allocated memory was 602865664 and is now 695140352. 2025-12-04T09:28:45.0212560Z 2025-12-04T09:28:45.0212754Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0213569Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:45.0214230Z 2025-12-04T09:28:45.0214469Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0214822Z 2025-12-04T09:28:45.0214826Z 2025-12-04T09:28:45.0215040Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:45.0215813Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:45.0217283Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-03186403898f3bbb.xml - 2025-12-04T09:28:45.0218488Z =========================== short test summary info ============================ 2025-12-04T09:28:45.0219554Z FAILED [21.5254s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:45.0220552Z Traceback (most recent call last): 2025-12-04T09:28:45.0221538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0222342Z getattr(self, test_name)() 2025-12-04T09:28:45.0223107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0223876Z fn() 2025-12-04T09:28:45.0224528Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0225287Z method(*args, **kwargs) 2025-12-04T09:28:45.0226006Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0226753Z method(*args, **kwargs) 2025-12-04T09:28:45.0227465Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0228219Z with policy(): 2025-12-04T09:28:45.0228889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0229658Z raise RuntimeError(msg) 2025-12-04T09:28:45.0231061Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 1. CUDA driver allocated memory was 604962816 and is now 695140352. 2025-12-04T09:28:45.0232271Z 2025-12-04T09:28:45.0232610Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0233556Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:45.0234193Z 2025-12-04T09:28:45.0234431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0234804Z 2025-12-04T09:28:45.0234951Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:45.0235341Z Traceback (most recent call last): 2025-12-04T09:28:45.0236038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0236752Z getattr(self, test_name)() 2025-12-04T09:28:45.0237431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0238122Z fn() 2025-12-04T09:28:45.0238690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0239373Z method(*args, **kwargs) 2025-12-04T09:28:45.0240009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0240674Z method(*args, **kwargs) 2025-12-04T09:28:45.0241312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0242026Z with policy(): 2025-12-04T09:28:45.0242636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0243308Z raise RuntimeError(msg) 2025-12-04T09:28:45.0244454Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 2. CUDA driver allocated memory was 602865664 and is now 695140352. 2025-12-04T09:28:45.0245571Z 2025-12-04T09:28:45.0245762Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0246575Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:45.0247196Z 2025-12-04T09:28:45.0247430Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0247961Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:45.0248411Z ======================= 1 failed, 3 deselected in 21.55s ======================= 2025-12-04T09:28:45.0248787Z Got exit code 1 2025-12-04T09:28:45.0249020Z Retrying single test... 2025-12-04T09:28:45.0249830Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a3dc994784795bc1.xml 2025-12-04T09:28:45.0250740Z ============================= test session starts ============================== 2025-12-04T09:28:45.0251313Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:45.0251852Z cachedir: .pytest_cache 2025-12-04T09:28:45.0252483Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:45.0253179Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:45.0253486Z configfile: pytest.ini 2025-12-04T09:28:45.0254133Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:45.0254924Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:28:45.0255850Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda 2025-12-04T09:28:45.0257081Z Running 1 items in this shard 2025-12-04T09:28:45.0257305Z 2025-12-04T09:28:45.0258264Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda I1204 09:26:16.104000 38097 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 38149 2025-12-04T09:28:45.0259845Z I1204 09:26:16.104000 38097 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 38150 2025-12-04T09:28:45.0260977Z I1204 09:26:16.105000 38097 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 38151 2025-12-04T09:28:45.0262096Z I1204 09:26:16.106000 38097 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 38152 2025-12-04T09:28:45.0263979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:28:45.0265483Z self.encoder = TransformerEncoder( 2025-12-04T09:28:45.0266980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:28:45.0268467Z self.encoder = TransformerEncoder( 2025-12-04T09:28:45.0270032Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:28:45.0271441Z self.encoder = TransformerEncoder( 2025-12-04T09:28:45.0272871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:28:45.0274281Z self.encoder = TransformerEncoder( 2025-12-04T09:28:45.0276186Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.0277985Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.0279781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.0281568Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.0283352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.0285134Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.0286970Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.0288757Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.0289912Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:28:45.0291024Z return func(*args, **kwargs) 2025-12-04T09:28:45.0292078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0293171Z return fsdp_fn(module, **kwargs) 2025-12-04T09:28:45.0294237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0295319Z return fsdp_fn(module, **kwargs) 2025-12-04T09:28:45.0296437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0297814Z return fsdp_fn(module, **kwargs) 2025-12-04T09:28:45.0299014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0300272Z return fsdp_fn(module, **kwargs) 2025-12-04T09:28:45.0301497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0302762Z fsdp_model = FSDP( 2025-12-04T09:28:45.0303938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0305166Z fsdp_model = FSDP( 2025-12-04T09:28:45.0306324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0307559Z fsdp_model = FSDP( 2025-12-04T09:28:45.0308731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0309946Z fsdp_model = FSDP( 2025-12-04T09:28:45.0314144Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:45.0318575Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:45.0323611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:45.0328607Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:45.0333728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:45.0338676Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:45.0343712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:28:45.0348815Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:28:45.0350260Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0351364Z fsdp_model.transformer.encoder = FSDP( 2025-12-04T09:28:45.0352523Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0353605Z fsdp_model.transformer.encoder = FSDP( 2025-12-04T09:28:45.0354684Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0355781Z fsdp_model.transformer.encoder = FSDP( 2025-12-04T09:28:45.0356868Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:123: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0357962Z fsdp_model.transformer.encoder = FSDP( 2025-12-04T09:28:45.0358615Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0359640Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0361139Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0362598Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0364067Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0365460Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0366809Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0368254Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0369671Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0371092Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0372512Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0373906Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0375288Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0376971Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0379208Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1929728 on device 0. CUDA driver allocated memory was 714014720 and is now 804192256. 2025-12-04T09:28:45.0381305Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0382485Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0384296Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:45.0385828Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0387056Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0388465Z [rank0]:E1204 09:26:35.243000 38149 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:45.0389740Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0390738Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0392229Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0393697Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0395188Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0396555Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0397911Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0399339Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0400957Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0402461Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0403951Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0405423Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0406896Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0408418Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0410528Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 1. CUDA driver allocated memory was 602865664 and is now 695140352. 2025-12-04T09:28:45.0412446Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0413540Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0415245Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:45.0416914Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0418151Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0419557Z [rank1]:E1204 09:26:35.245000 38150 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:45.0420715Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0422027Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0423713Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0425438Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0427077Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0428661Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0430177Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0431775Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0433506Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0434926Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0436346Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0437722Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0439113Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0440529Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0442510Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 3. CUDA driver allocated memory was 487522304 and is now 695140352. 2025-12-04T09:28:45.0444322Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0445367Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0446986Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:45.0448331Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0449616Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0450948Z [rank3]:E1204 09:26:35.245000 38152 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:45.0452026Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0453095Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0454660Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0456304Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0458130Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0459667Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0461170Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0462773Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0464374Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0465975Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0467572Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0469222Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0470744Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0472471Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0474523Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 2. CUDA driver allocated memory was 602865664 and is now 695140352. 2025-12-04T09:28:45.0476326Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0477366Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0478995Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:45.0480517Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0481672Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0482992Z [rank2]:E1204 09:26:35.245000 38151 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:45.0483728Z dist init r=0, world=4 2025-12-04T09:28:45.0483998Z dist init r=1, world=4 2025-12-04T09:28:45.0484264Z dist init r=3, world=4 2025-12-04T09:28:45.0484514Z dist init r=2, world=4 2025-12-04T09:28:45.0485809Z [rank0]:[W1204 09:26:35.302210914 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:28:45.0487121Z FAILED [21.2721s] [100%] 2025-12-04T09:28:45.0487322Z 2025-12-04T09:28:45.0487478Z =================================== FAILURES =================================== 2025-12-04T09:28:45.0503459Z __________________ TestClipGradNormCUDA.test_ddp_parity_cuda ___________________ 2025-12-04T09:28:45.0504159Z Traceback (most recent call last): 2025-12-04T09:28:45.0505000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:45.0505814Z self._join_processes(fn) 2025-12-04T09:28:45.0506617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:45.0507499Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:45.0508388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:45.0509450Z raise RuntimeError(error) 2025-12-04T09:28:45.0509868Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:45.0510339Z Traceback (most recent call last): 2025-12-04T09:28:45.0511077Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0511830Z getattr(self, test_name)() 2025-12-04T09:28:45.0512531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0513367Z fn() 2025-12-04T09:28:45.0513950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0514617Z method(*args, **kwargs) 2025-12-04T09:28:45.0515259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0515936Z method(*args, **kwargs) 2025-12-04T09:28:45.0516682Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0517351Z with policy(): 2025-12-04T09:28:45.0517963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0518651Z raise RuntimeError(msg) 2025-12-04T09:28:45.0519785Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 3. CUDA driver allocated memory was 487522304 and is now 695140352. 2025-12-04T09:28:45.0521053Z 2025-12-04T09:28:45.0521425Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0522358Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:45.0523073Z 2025-12-04T09:28:45.0523346Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0523758Z 2025-12-04T09:28:45.0523763Z 2025-12-04T09:28:45.0523993Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:45.0524626Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:45.0525926Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a3dc994784795bc1.xml - 2025-12-04T09:28:45.0527116Z =========================== short test summary info ============================ 2025-12-04T09:28:45.0528267Z FAILED [21.2721s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:45.0529266Z Traceback (most recent call last): 2025-12-04T09:28:45.0530067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0530905Z getattr(self, test_name)() 2025-12-04T09:28:45.0531672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0532453Z fn() 2025-12-04T09:28:45.0533090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0534035Z method(*args, **kwargs) 2025-12-04T09:28:45.0534672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0535343Z method(*args, **kwargs) 2025-12-04T09:28:45.0535965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0536886Z with policy(): 2025-12-04T09:28:45.0537580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0538342Z raise RuntimeError(msg) 2025-12-04T09:28:45.0539630Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_ddp_parity_cuda! Caching allocator allocated memory was 512 and is now reported as 1963520 on device 3. CUDA driver allocated memory was 487522304 and is now 695140352. 2025-12-04T09:28:45.0540845Z 2025-12-04T09:28:45.0541064Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0541990Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_ddp_parity_cuda 2025-12-04T09:28:45.0542692Z 2025-12-04T09:28:45.0542967Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0543543Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:45.0544133Z ======================= 1 failed, 3 deselected in 21.29s ======================= 2025-12-04T09:28:45.0544563Z Got exit code 1 2025-12-04T09:28:45.0545200Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda 2025-12-04T09:28:45.0546230Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:45.0547496Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-b1d6139c1033a518.xml 2025-12-04T09:28:45.0548522Z ============================= test session starts ============================== 2025-12-04T09:28:45.0549347Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:45.0549882Z cachedir: .pytest_cache 2025-12-04T09:28:45.0550514Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:45.0551218Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:45.0551525Z configfile: pytest.ini 2025-12-04T09:28:45.0552175Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:45.0552965Z collecting ... collected 4 items / 1 deselected / 3 selected 2025-12-04T09:28:45.0553387Z stepcurrent: skipping 1 already run items. 2025-12-04T09:28:45.0553735Z Running 3 items in this shard 2025-12-04T09:28:45.0553921Z 2025-12-04T09:28:45.0554820Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda I1204 09:26:42.094000 38526 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 38578 2025-12-04T09:28:45.0556297Z I1204 09:26:42.095000 38526 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 38579 2025-12-04T09:28:45.0557304Z I1204 09:26:42.095000 38526 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 38580 2025-12-04T09:28:45.0558351Z I1204 09:26:42.096000 38526 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 38581 2025-12-04T09:28:45.0560455Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.0562262Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.0564052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.0565831Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.0567625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.0569411Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.0571238Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.0573025Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.0574185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:28:45.0575285Z return func(*args, **kwargs) 2025-12-04T09:28:45.0576459Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0577879Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:28:45.0579151Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0580401Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:28:45.0581663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0582930Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:28:45.0584180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0585464Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:28:45.0586694Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0587939Z fsdp_model = FSDP( 2025-12-04T09:28:45.0589272Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0590386Z fsdp_model = FSDP( 2025-12-04T09:28:45.0591453Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0592662Z fsdp_model = FSDP( 2025-12-04T09:28:45.0593671Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0594721Z fsdp_model = FSDP( 2025-12-04T09:28:45.0595298Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0596317Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0597810Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0599273Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0600737Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0602154Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0603507Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0604931Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0606340Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0607765Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0609189Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0610571Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0611957Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0613368Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0615339Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 0. CUDA driver allocated memory was 714014720 and is now 762249216. 2025-12-04T09:28:45.0617566Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0618751Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0620628Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.0622366Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0623606Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0625022Z [rank0]:E1204 09:26:49.443000 38578 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:45.0626175Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0627297Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0628984Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0630641Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0633031Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0634411Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0635747Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0637172Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0638601Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0640026Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0641433Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0642814Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0644199Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0645663Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0647620Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 2. CUDA driver allocated memory was 607059968 and is now 653197312. 2025-12-04T09:28:45.0649476Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0650520Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0652183Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.0653582Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0654685Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0655928Z [rank2]:E1204 09:26:49.444000 38580 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:45.0657230Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0658359Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0660047Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0661691Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0663398Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0664945Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0666460Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0668052Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0669769Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0671191Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0672603Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0673978Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0675359Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0676806Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0678761Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 1. CUDA driver allocated memory was 604962816 and is now 653197312. 2025-12-04T09:28:45.0680625Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0681679Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0683337Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.0684730Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0685827Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0687083Z [rank1]:E1204 09:26:49.444000 38579 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:45.0688102Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0689093Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0690639Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0692110Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0693576Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0694939Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0696328Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0698059Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0699669Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0701282Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0702883Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0704426Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0706032Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0707664Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0709875Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 3. CUDA driver allocated memory was 495910912 and is now 653197312. 2025-12-04T09:28:45.0711708Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0712735Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0714400Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.0715795Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0716890Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0718138Z [rank3]:E1204 09:26:49.445000 38581 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:45.0718834Z dist init r=3, world=4 2025-12-04T09:28:45.0719091Z dist init r=0, world=4 2025-12-04T09:28:45.0719344Z dist init r=2, world=4 2025-12-04T09:28:45.0719582Z dist init r=1, world=4 2025-12-04T09:28:45.0720974Z [rank0]:[W1204 09:26:49.480385136 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:28:45.0722523Z FAILED [9.7106s] [ 33%] 2025-12-04T09:28:45.0722700Z 2025-12-04T09:28:45.0722865Z =================================== FAILURES =================================== 2025-12-04T09:28:45.0723428Z ______________ TestClipGradNormCUDA.test_low_precision_grads_cuda ______________ 2025-12-04T09:28:45.0723971Z Traceback (most recent call last): 2025-12-04T09:28:45.0724765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:45.0725558Z self._join_processes(fn) 2025-12-04T09:28:45.0726358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:45.0727231Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:45.0728127Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:45.0728984Z raise RuntimeError(error) 2025-12-04T09:28:45.0729436Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:45.0729934Z Traceback (most recent call last): 2025-12-04T09:28:45.0730718Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0731505Z getattr(self, test_name)() 2025-12-04T09:28:45.0732256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0733092Z fn() 2025-12-04T09:28:45.0733814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0734494Z method(*args, **kwargs) 2025-12-04T09:28:45.0735136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0735855Z method(*args, **kwargs) 2025-12-04T09:28:45.0736541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0737443Z with policy(): 2025-12-04T09:28:45.0738125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0738881Z raise RuntimeError(msg) 2025-12-04T09:28:45.0740215Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 3. CUDA driver allocated memory was 495910912 and is now 653197312. 2025-12-04T09:28:45.0741481Z 2025-12-04T09:28:45.0741701Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0742670Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.0743418Z 2025-12-04T09:28:45.0743697Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0744101Z 2025-12-04T09:28:45.0744106Z 2025-12-04T09:28:45.0744329Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:45.0744963Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:45.0746268Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-b1d6139c1033a518.xml - 2025-12-04T09:28:45.0747474Z =========================== short test summary info ============================ 2025-12-04T09:28:45.0748748Z FAILED [9.7106s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:28:45.0749818Z Traceback (most recent call last): 2025-12-04T09:28:45.0750526Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0751249Z getattr(self, test_name)() 2025-12-04T09:28:45.0751909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0752605Z fn() 2025-12-04T09:28:45.0753187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0753864Z method(*args, **kwargs) 2025-12-04T09:28:45.0754501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0755181Z method(*args, **kwargs) 2025-12-04T09:28:45.0755819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0756481Z with policy(): 2025-12-04T09:28:45.0757090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0757774Z raise RuntimeError(msg) 2025-12-04T09:28:45.0758934Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 3. CUDA driver allocated memory was 495910912 and is now 653197312. 2025-12-04T09:28:45.0760046Z 2025-12-04T09:28:45.0760267Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0761124Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.0761791Z 2025-12-04T09:28:45.0762045Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0762599Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:45.0763035Z ======================= 1 failed, 1 deselected in 9.73s ======================== 2025-12-04T09:28:45.0763412Z Got exit code 1 2025-12-04T09:28:45.0763654Z Retrying single test... 2025-12-04T09:28:45.0764446Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-ebdc3db326996caa.xml 2025-12-04T09:28:45.0765356Z ============================= test session starts ============================== 2025-12-04T09:28:45.0765946Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:45.0766480Z cachedir: .pytest_cache 2025-12-04T09:28:45.0767094Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:45.0767797Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:45.0768115Z configfile: pytest.ini 2025-12-04T09:28:45.0768751Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:45.0769544Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:28:45.0770467Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda 2025-12-04T09:28:45.0771287Z Running 1 items in this shard 2025-12-04T09:28:45.0771482Z 2025-12-04T09:28:45.0772359Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda I1204 09:26:56.524000 38863 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 38915 2025-12-04T09:28:45.0773842Z I1204 09:26:56.524000 38863 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 38916 2025-12-04T09:28:45.0774858Z I1204 09:26:56.525000 38863 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 38917 2025-12-04T09:28:45.0775869Z I1204 09:26:56.526000 38863 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 38918 2025-12-04T09:28:45.0778383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.0780421Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.0782444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.0784455Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.0786460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.0788501Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.0790521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.0792331Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.0793485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:28:45.0794598Z return func(*args, **kwargs) 2025-12-04T09:28:45.0795670Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0796782Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:28:45.0797909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0799029Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:28:45.0800141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0801249Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:28:45.0802362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0803484Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:28:45.0804618Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0805682Z fsdp_model = FSDP( 2025-12-04T09:28:45.0806681Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0807727Z fsdp_model = FSDP( 2025-12-04T09:28:45.0808735Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0809776Z fsdp_model = FSDP( 2025-12-04T09:28:45.0810972Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.0812084Z fsdp_model = FSDP( 2025-12-04T09:28:45.0812665Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0813718Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0815284Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0817081Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0818756Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0820282Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0821994Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0823593Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0825170Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0826752Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0828328Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0829865Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0831414Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0833106Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0835337Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 1. CUDA driver allocated memory was 607059968 and is now 653197312. 2025-12-04T09:28:45.0837321Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0838443Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0840239Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.0841750Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0842930Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0844273Z [rank1]:E1204 09:27:03.798000 38916 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:45.0845367Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0846450Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0848066Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0849684Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0851347Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0852806Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0854211Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0855693Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0857439Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0859021Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0860607Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0862145Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0863689Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0865274Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0867505Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 2. CUDA driver allocated memory was 609157120 and is now 653197312. 2025-12-04T09:28:45.0869620Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0870707Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0872446Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.0873911Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0875054Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0876370Z [rank2]:E1204 09:27:03.799000 38917 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:45.0877425Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0878462Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0880055Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0881598Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0883149Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0884573Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0885961Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0887450Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0888939Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0890423Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0891923Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0893378Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0894843Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0896834Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0899155Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 0. CUDA driver allocated memory was 714014720 and is now 762249216. 2025-12-04T09:28:45.0901206Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0902373Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0904242Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.0905798Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0907015Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0908418Z [rank0]:E1204 09:27:03.800000 38915 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:45.0909604Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.0910990Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.0912756Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0914363Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.0915944Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0917418Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.0918875Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0920421Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0922329Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0923919Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.0925510Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0927066Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.0928708Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0930299Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.0932487Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 3. CUDA driver allocated memory was 607059968 and is now 653197312. 2025-12-04T09:28:45.0934740Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0935826Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0937861Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.0939433Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.0940656Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0942057Z [rank3]:E1204 09:27:03.800000 38918 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:45.0942888Z dist init r=1, world=4 2025-12-04T09:28:45.0943158Z dist init r=0, world=4 2025-12-04T09:28:45.0943428Z dist init r=2, world=4 2025-12-04T09:28:45.0943696Z dist init r=3, world=4 2025-12-04T09:28:45.0945011Z [rank0]:[W1204 09:27:04.819543431 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:28:45.0946444Z FAILED [8.9971s] [100%] 2025-12-04T09:28:45.0946626Z 2025-12-04T09:28:45.0946772Z =================================== FAILURES =================================== 2025-12-04T09:28:45.0947335Z ______________ TestClipGradNormCUDA.test_low_precision_grads_cuda ______________ 2025-12-04T09:28:45.0947861Z Traceback (most recent call last): 2025-12-04T09:28:45.0948729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:45.0949433Z self._join_processes(fn) 2025-12-04T09:28:45.0950125Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:45.0950890Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:45.0951660Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:45.0952417Z raise RuntimeError(error) 2025-12-04T09:28:45.0952802Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:45.0953234Z Traceback (most recent call last): 2025-12-04T09:28:45.0953922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0954621Z getattr(self, test_name)() 2025-12-04T09:28:45.0955273Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0955955Z fn() 2025-12-04T09:28:45.0956521Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0957221Z method(*args, **kwargs) 2025-12-04T09:28:45.0957855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0958522Z method(*args, **kwargs) 2025-12-04T09:28:45.0959147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0959791Z with policy(): 2025-12-04T09:28:45.0960389Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0961065Z raise RuntimeError(msg) 2025-12-04T09:28:45.0962222Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 1. CUDA driver allocated memory was 607059968 and is now 653197312. 2025-12-04T09:28:45.0963333Z 2025-12-04T09:28:45.0963525Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0964374Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.0965030Z 2025-12-04T09:28:45.0965270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0965620Z 2025-12-04T09:28:45.0965625Z 2025-12-04T09:28:45.0965830Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:45.0966377Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:45.0967533Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-ebdc3db326996caa.xml - 2025-12-04T09:28:45.0968630Z =========================== short test summary info ============================ 2025-12-04T09:28:45.0969607Z FAILED [8.9971s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:45.0970540Z Traceback (most recent call last): 2025-12-04T09:28:45.0971228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.0971917Z getattr(self, test_name)() 2025-12-04T09:28:45.0972570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.0973245Z fn() 2025-12-04T09:28:45.0973819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0974487Z method(*args, **kwargs) 2025-12-04T09:28:45.0975103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.0975773Z method(*args, **kwargs) 2025-12-04T09:28:45.0976480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.0977364Z with policy(): 2025-12-04T09:28:45.0978024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.0978780Z raise RuntimeError(msg) 2025-12-04T09:28:45.0980081Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 1. CUDA driver allocated memory was 607059968 and is now 653197312. 2025-12-04T09:28:45.0981317Z 2025-12-04T09:28:45.0981533Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.0982473Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.0983282Z 2025-12-04T09:28:45.0983550Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.0984123Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:45.0984608Z ======================= 1 failed, 3 deselected in 9.02s ======================== 2025-12-04T09:28:45.0985010Z Got exit code 1 2025-12-04T09:28:45.0985268Z Retrying single test... 2025-12-04T09:28:45.0986162Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-c42bc725a7562377.xml 2025-12-04T09:28:45.0987154Z ============================= test session starts ============================== 2025-12-04T09:28:45.0987812Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:45.0988401Z cachedir: .pytest_cache 2025-12-04T09:28:45.0989285Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:45.0989965Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:45.0990267Z configfile: pytest.ini 2025-12-04T09:28:45.0990898Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:45.0991666Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:28:45.0992573Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda 2025-12-04T09:28:45.0993393Z Running 1 items in this shard 2025-12-04T09:28:45.0993603Z 2025-12-04T09:28:45.0994495Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda I1204 09:27:10.373000 39200 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 39252 2025-12-04T09:28:45.0995919Z I1204 09:27:10.374000 39200 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 39253 2025-12-04T09:28:45.0996951Z I1204 09:27:10.375000 39200 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 39254 2025-12-04T09:28:45.0997952Z I1204 09:27:10.376000 39200 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 39255 2025-12-04T09:28:45.1000037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1001832Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1003604Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1005393Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1007166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1008943Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1010816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1012590Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1013723Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:28:45.1014821Z return func(*args, **kwargs) 2025-12-04T09:28:45.1015889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.1017285Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:28:45.1018521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.1019770Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:28:45.1021191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.1022442Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:28:45.1023750Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.1025008Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:28:45.1026220Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.1027449Z fsdp_model = FSDP( 2025-12-04T09:28:45.1028556Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.1029739Z fsdp_model = FSDP( 2025-12-04T09:28:45.1030863Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.1032047Z fsdp_model = FSDP( 2025-12-04T09:28:45.1033229Z /var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_clip_grad_norm.py:275: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:28:45.1034278Z fsdp_model = FSDP( 2025-12-04T09:28:45.1034834Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1035828Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1037299Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1038756Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1040270Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1041622Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1042953Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1044346Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1045758Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1047173Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1048588Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1049969Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1051338Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1052791Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1054753Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 0. CUDA driver allocated memory was 714014720 and is now 762249216. 2025-12-04T09:28:45.1056832Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1057995Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1059842Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.1061406Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1062630Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1064038Z [rank0]:E1204 09:27:17.706000 39252 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:45.1065169Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1066288Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1067958Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1069629Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1071131Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1072486Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1073818Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1075226Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1076637Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1078050Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1079450Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1080820Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1082194Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1083632Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1085577Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 1. CUDA driver allocated memory was 602865664 and is now 653197312. 2025-12-04T09:28:45.1087424Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1088453Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1090101Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.1091490Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1092576Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1093810Z [rank1]:E1204 09:27:17.706000 39253 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:45.1094814Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1095808Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1097670Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1099310Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1100949Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1102466Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1103966Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1105553Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1107138Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1108722Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1110245Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1111790Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1113281Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1114799Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1116864Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 2. CUDA driver allocated memory was 607059968 and is now 653197312. 2025-12-04T09:28:45.1118797Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1120061Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1122191Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.1123744Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1124967Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1126367Z [rank2]:E1204 09:27:17.707000 39254 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:45.1127501Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1128618Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1130386Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1132039Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1133811Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1135377Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1137033Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1138642Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1140235Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1141820Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1143397Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1144979Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1146530Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1148163Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1150434Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 3. CUDA driver allocated memory was 516882432 and is now 653197312. 2025-12-04T09:28:45.1152258Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1153288Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1154939Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.1156318Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1157399Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1158647Z [rank3]:E1204 09:27:17.708000 39255 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:45.1159330Z dist init r=0, world=4 2025-12-04T09:28:45.1159573Z dist init r=3, world=4 2025-12-04T09:28:45.1159816Z dist init r=1, world=4 2025-12-04T09:28:45.1160110Z dist init r=2, world=4 2025-12-04T09:28:45.1161287Z [rank0]:[W1204 09:27:18.722161313 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:28:45.1162507Z FAILED [9.1019s] [100%] 2025-12-04T09:28:45.1162658Z 2025-12-04T09:28:45.1162796Z =================================== FAILURES =================================== 2025-12-04T09:28:45.1163290Z ______________ TestClipGradNormCUDA.test_low_precision_grads_cuda ______________ 2025-12-04T09:28:45.1163763Z Traceback (most recent call last): 2025-12-04T09:28:45.1164460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:45.1165155Z self._join_processes(fn) 2025-12-04T09:28:45.1165866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:45.1166635Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:45.1167415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:45.1168161Z raise RuntimeError(error) 2025-12-04T09:28:45.1168555Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:45.1168985Z Traceback (most recent call last): 2025-12-04T09:28:45.1169677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1170399Z getattr(self, test_name)() 2025-12-04T09:28:45.1171058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1171746Z fn() 2025-12-04T09:28:45.1172310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1173004Z method(*args, **kwargs) 2025-12-04T09:28:45.1173632Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1174300Z method(*args, **kwargs) 2025-12-04T09:28:45.1174917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1175567Z with policy(): 2025-12-04T09:28:45.1176156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1177106Z raise RuntimeError(msg) 2025-12-04T09:28:45.1178479Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 0. CUDA driver allocated memory was 714014720 and is now 762249216. 2025-12-04T09:28:45.1179734Z 2025-12-04T09:28:45.1179946Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1180891Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.1181635Z 2025-12-04T09:28:45.1181907Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1182308Z 2025-12-04T09:28:45.1182313Z 2025-12-04T09:28:45.1182532Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:45.1183145Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:45.1184445Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-c42bc725a7562377.xml - 2025-12-04T09:28:45.1185700Z =========================== short test summary info ============================ 2025-12-04T09:28:45.1186792Z FAILED [9.1019s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:45.1187820Z Traceback (most recent call last): 2025-12-04T09:28:45.1188603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1189444Z getattr(self, test_name)() 2025-12-04T09:28:45.1190098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1190780Z fn() 2025-12-04T09:28:45.1191353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1192010Z method(*args, **kwargs) 2025-12-04T09:28:45.1192642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1193310Z method(*args, **kwargs) 2025-12-04T09:28:45.1193931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1194578Z with policy(): 2025-12-04T09:28:45.1195173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1195843Z raise RuntimeError(msg) 2025-12-04T09:28:45.1196994Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_low_precision_grads_cuda! Caching allocator allocated memory was 512 and is now reported as 92672 on device 0. CUDA driver allocated memory was 714014720 and is now 762249216. 2025-12-04T09:28:45.1198122Z 2025-12-04T09:28:45.1198311Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1199159Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_low_precision_grads_cuda 2025-12-04T09:28:45.1199843Z 2025-12-04T09:28:45.1200082Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1200597Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:45.1201029Z ======================= 1 failed, 3 deselected in 9.12s ======================== 2025-12-04T09:28:45.1201391Z Got exit code 1 2025-12-04T09:28:45.1201999Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda 2025-12-04T09:28:45.1202937Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:45.1204054Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-4818210284e31d5e.xml 2025-12-04T09:28:45.1204963Z ============================= test session starts ============================== 2025-12-04T09:28:45.1205543Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:45.1206061Z cachedir: .pytest_cache 2025-12-04T09:28:45.1206683Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:45.1207367Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:45.1207665Z configfile: pytest.ini 2025-12-04T09:28:45.1208302Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:45.1209083Z collecting ... collected 4 items / 2 deselected / 2 selected 2025-12-04T09:28:45.1209505Z stepcurrent: skipping 2 already run items. 2025-12-04T09:28:45.1209835Z Running 2 items in this shard 2025-12-04T09:28:45.1210028Z 2025-12-04T09:28:45.1210932Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda I1204 09:27:24.383000 39537 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 39589 2025-12-04T09:28:45.1212333Z I1204 09:27:24.384000 39537 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 39590 2025-12-04T09:28:45.1213341Z I1204 09:27:24.385000 39537 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 39591 2025-12-04T09:28:45.1214332Z I1204 09:27:24.386000 39537 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 39592 2025-12-04T09:28:45.1216486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1218660Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1220671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1222848Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1224855Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1226952Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1228959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1230953Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1232243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:28:45.1233528Z return func(*args, **kwargs) 2025-12-04T09:28:45.1235233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1237222Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1239103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1240993Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1242942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1243100Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1244710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1244862Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1245801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:28:45.1245918Z return func(*args, **kwargs) 2025-12-04T09:28:45.1246353Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1246870Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1247816Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1248325Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1249264Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1249770Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1250634Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1251064Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1251927Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1252360Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1253211Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1253613Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1254463Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1254908Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1256395Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 714014720 and is now 732889088. 2025-12-04T09:28:45.1256910Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1257566Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1258604Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1258976Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1259694Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1260248Z [rank0]:E1204 09:27:31.326000 39589 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:45.1260698Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1261232Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1262261Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1262769Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1263794Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1264189Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1265151Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1265640Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1266610Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1267095Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1268054Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1268507Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1269511Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1270005Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1271362Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 602865664 and is now 623837184. 2025-12-04T09:28:45.1271691Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1272273Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1273185Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1273518Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1274151Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1274645Z [rank1]:E1204 09:27:31.327000 39590 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:45.1275045Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1275579Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1276469Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1276942Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1277831Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1278183Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1279051Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1279488Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1280353Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1280787Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1281639Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1282046Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1282949Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1283396Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1284747Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 477036544 and is now 623837184. 2025-12-04T09:28:45.1285079Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1285666Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1286576Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1286906Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1287538Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1288028Z [rank3]:E1204 09:27:31.328000 39592 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:45.1288454Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1288937Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1289847Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1290298Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1291183Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1291536Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1292399Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1292832Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1293688Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1294117Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1294968Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1295423Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1296335Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1296964Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1298482Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T09:28:45.1298859Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1299517Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1300539Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1300909Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1301622Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1302215Z [rank2]:E1204 09:27:31.329000 39591 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:45.1302316Z dist init r=3, world=4 2025-12-04T09:28:45.1302414Z dist init r=0, world=4 2025-12-04T09:28:45.1302527Z dist init r=1, world=4 2025-12-04T09:28:45.1302662Z dist init r=2, world=4 2025-12-04T09:28:45.1302757Z FAILED [8.6116s] [ 50%] 2025-12-04T09:28:45.1302775Z 2025-12-04T09:28:45.1302922Z =================================== FAILURES =================================== 2025-12-04T09:28:45.1303188Z _________________ TestClipGradNormCUDA.test_no_gradients_cuda __________________ 2025-12-04T09:28:45.1303323Z Traceback (most recent call last): 2025-12-04T09:28:45.1303868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:45.1303977Z self._join_processes(fn) 2025-12-04T09:28:45.1304570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:45.1304713Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:45.1305329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:45.1305444Z raise RuntimeError(error) 2025-12-04T09:28:45.1305678Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:45.1305805Z Traceback (most recent call last): 2025-12-04T09:28:45.1306343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1306455Z getattr(self, test_name)() 2025-12-04T09:28:45.1306998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1307091Z fn() 2025-12-04T09:28:45.1307610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1307712Z method(*args, **kwargs) 2025-12-04T09:28:45.1308778Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1308995Z method(*args, **kwargs) 2025-12-04T09:28:45.1309444Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1309531Z with policy(): 2025-12-04T09:28:45.1309991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1310089Z raise RuntimeError(msg) 2025-12-04T09:28:45.1311046Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 714014720 and is now 732889088. 2025-12-04T09:28:45.1311055Z 2025-12-04T09:28:45.1311247Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1311762Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1311778Z 2025-12-04T09:28:45.1312011Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1312016Z 2025-12-04T09:28:45.1312020Z 2025-12-04T09:28:45.1312216Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:45.1312459Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:45.1313251Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-4818210284e31d5e.xml - 2025-12-04T09:28:45.1313438Z =========================== short test summary info ============================ 2025-12-04T09:28:45.1314100Z FAILED [8.6116s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:45.1314208Z Traceback (most recent call last): 2025-12-04T09:28:45.1314733Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1314828Z getattr(self, test_name)() 2025-12-04T09:28:45.1315301Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1315386Z fn() 2025-12-04T09:28:45.1315843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1315945Z method(*args, **kwargs) 2025-12-04T09:28:45.1316393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1316490Z method(*args, **kwargs) 2025-12-04T09:28:45.1316946Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1317040Z with policy(): 2025-12-04T09:28:45.1317501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1317600Z raise RuntimeError(msg) 2025-12-04T09:28:45.1318552Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 714014720 and is now 732889088. 2025-12-04T09:28:45.1318557Z 2025-12-04T09:28:45.1318756Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1319267Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1319273Z 2025-12-04T09:28:45.1319516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1319729Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:45.1319892Z ======================= 1 failed, 2 deselected in 8.63s ======================== 2025-12-04T09:28:45.1319986Z Got exit code 1 2025-12-04T09:28:45.1320081Z Retrying single test... 2025-12-04T09:28:45.1320711Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-1b5186457c75b3fb.xml 2025-12-04T09:28:45.1321000Z ============================= test session starts ============================== 2025-12-04T09:28:45.1321495Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:45.1321615Z cachedir: .pytest_cache 2025-12-04T09:28:45.1322130Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:45.1322250Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:45.1322365Z configfile: pytest.ini 2025-12-04T09:28:45.1322906Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:45.1323111Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:28:45.1323784Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda 2025-12-04T09:28:45.1323898Z Running 1 items in this shard 2025-12-04T09:28:45.1323903Z 2025-12-04T09:28:45.1324865Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda I1204 09:27:37.803000 39850 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 39902 2025-12-04T09:28:45.1325426Z I1204 09:27:37.804000 39850 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 39903 2025-12-04T09:28:45.1325941Z I1204 09:27:37.805000 39850 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 39904 2025-12-04T09:28:45.1326488Z I1204 09:27:37.806000 39850 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 39905 2025-12-04T09:28:45.1328223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1328405Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1330124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1330302Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1332020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1332196Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1334058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1334227Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1335165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:28:45.1335273Z return func(*args, **kwargs) 2025-12-04T09:28:45.1337147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1337316Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1339044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1339207Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1340917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1341114Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1342839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1343085Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1344081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:28:45.1344206Z return func(*args, **kwargs) 2025-12-04T09:28:45.1344666Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1345219Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1346225Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1346736Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1347741Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1348137Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1349242Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1349709Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1350623Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1351082Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1351993Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1352421Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1353425Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1353871Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1355220Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T09:28:45.1355604Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1356212Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1357130Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1357450Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1358089Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1358585Z [rank1]:E1204 09:27:44.748000 39903 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:45.1358990Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1359478Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1360367Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1360816Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1361701Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1362099Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1362962Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1363397Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1364257Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1364693Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1365551Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1365959Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1366817Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1367263Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1368637Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 714014720 and is now 732889088. 2025-12-04T09:28:45.1368993Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1369578Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1370485Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1370820Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1371456Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1371952Z [rank0]:E1204 09:27:44.750000 39902 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:45.1372352Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1372834Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1373721Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1374171Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1375105Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1375458Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1376382Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1377001Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1377979Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1378470Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1379436Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1379893Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1380857Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1381396Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1382932Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T09:28:45.1383336Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1383995Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1385016Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1385392Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1386112Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1386669Z [rank2]:E1204 09:27:44.750000 39904 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:45.1387118Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1387647Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1388658Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1389291Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1390180Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1390535Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1391396Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1391831Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1392687Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1393130Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1393978Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1394383Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1395234Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1395709Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1397081Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 581894144 and is now 623837184. 2025-12-04T09:28:45.1397412Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1397997Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1398907Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1399240Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1399879Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1400368Z [rank3]:E1204 09:27:44.751000 39905 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:45.1400460Z dist init r=0, world=4 2025-12-04T09:28:45.1400549Z dist init r=1, world=4 2025-12-04T09:28:45.1400643Z dist init r=2, world=4 2025-12-04T09:28:45.1400730Z dist init r=3, world=4 2025-12-04T09:28:45.1400819Z FAILED [9.0692s] [100%] 2025-12-04T09:28:45.1400824Z 2025-12-04T09:28:45.1400966Z =================================== FAILURES =================================== 2025-12-04T09:28:45.1401201Z _________________ TestClipGradNormCUDA.test_no_gradients_cuda __________________ 2025-12-04T09:28:45.1401377Z Traceback (most recent call last): 2025-12-04T09:28:45.1401870Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:45.1401970Z self._join_processes(fn) 2025-12-04T09:28:45.1402497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:45.1402626Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:45.1403172Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:45.1403274Z raise RuntimeError(error) 2025-12-04T09:28:45.1403485Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:45.1403604Z Traceback (most recent call last): 2025-12-04T09:28:45.1404083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1404187Z getattr(self, test_name)() 2025-12-04T09:28:45.1404671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1404754Z fn() 2025-12-04T09:28:45.1405217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1405312Z method(*args, **kwargs) 2025-12-04T09:28:45.1405760Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1405863Z method(*args, **kwargs) 2025-12-04T09:28:45.1406338Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1406428Z with policy(): 2025-12-04T09:28:45.1406889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1407012Z raise RuntimeError(msg) 2025-12-04T09:28:45.1407969Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T09:28:45.1407974Z 2025-12-04T09:28:45.1408167Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1408676Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1408690Z 2025-12-04T09:28:45.1408931Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1408936Z 2025-12-04T09:28:45.1409084Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:45.1409198Z Traceback (most recent call last): 2025-12-04T09:28:45.1409685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1409787Z getattr(self, test_name)() 2025-12-04T09:28:45.1410274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1410355Z fn() 2025-12-04T09:28:45.1410812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1410906Z method(*args, **kwargs) 2025-12-04T09:28:45.1411353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1411458Z method(*args, **kwargs) 2025-12-04T09:28:45.1411905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1411991Z with policy(): 2025-12-04T09:28:45.1412502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1412604Z raise RuntimeError(msg) 2025-12-04T09:28:45.1413560Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T09:28:45.1413565Z 2025-12-04T09:28:45.1413918Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1414455Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1414472Z 2025-12-04T09:28:45.1414723Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1414728Z 2025-12-04T09:28:45.1414732Z 2025-12-04T09:28:45.1414947Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:45.1415202Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:45.1416041Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-1b5186457c75b3fb.xml - 2025-12-04T09:28:45.1416213Z =========================== short test summary info ============================ 2025-12-04T09:28:45.1417171Z FAILED [9.0692s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:45.1417294Z Traceback (most recent call last): 2025-12-04T09:28:45.1417895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1418007Z getattr(self, test_name)() 2025-12-04T09:28:45.1418550Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1418681Z fn() 2025-12-04T09:28:45.1419192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1419305Z method(*args, **kwargs) 2025-12-04T09:28:45.1419805Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1419912Z method(*args, **kwargs) 2025-12-04T09:28:45.1420423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1420522Z with policy(): 2025-12-04T09:28:45.1421227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1421340Z raise RuntimeError(msg) 2025-12-04T09:28:45.1422414Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T09:28:45.1422422Z 2025-12-04T09:28:45.1422646Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1423221Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1423226Z 2025-12-04T09:28:45.1423502Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1423507Z 2025-12-04T09:28:45.1423672Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:45.1423795Z Traceback (most recent call last): 2025-12-04T09:28:45.1424349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1424557Z getattr(self, test_name)() 2025-12-04T09:28:45.1425110Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1425200Z fn() 2025-12-04T09:28:45.1425711Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1425826Z method(*args, **kwargs) 2025-12-04T09:28:45.1426330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1426435Z method(*args, **kwargs) 2025-12-04T09:28:45.1426950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1427051Z with policy(): 2025-12-04T09:28:45.1427568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1427686Z raise RuntimeError(msg) 2025-12-04T09:28:45.1428755Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T09:28:45.1428760Z 2025-12-04T09:28:45.1428983Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1429555Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1429560Z 2025-12-04T09:28:45.1429829Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1430051Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:45.1430230Z ======================= 1 failed, 3 deselected in 9.09s ======================== 2025-12-04T09:28:45.1430337Z Got exit code 1 2025-12-04T09:28:45.1430447Z Retrying single test... 2025-12-04T09:28:45.1431203Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-74e02afb5846363a.xml 2025-12-04T09:28:45.1431377Z ============================= test session starts ============================== 2025-12-04T09:28:45.1431726Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:45.1431840Z cachedir: .pytest_cache 2025-12-04T09:28:45.1432357Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:45.1432591Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:45.1432705Z configfile: pytest.ini 2025-12-04T09:28:45.1433228Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:45.1433438Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:28:45.1434085Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda 2025-12-04T09:28:45.1434193Z Running 1 items in this shard 2025-12-04T09:28:45.1434198Z 2025-12-04T09:28:45.1446545Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda I1204 09:27:51.374000 40163 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 40215 2025-12-04T09:28:45.1447020Z I1204 09:27:51.375000 40163 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 40216 2025-12-04T09:28:45.1447499Z I1204 09:27:51.376000 40163 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 40217 2025-12-04T09:28:45.1447959Z I1204 09:27:51.377000 40163 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 40218 2025-12-04T09:28:45.1449702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1449862Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1451463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1451624Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1453231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1453387Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1454980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1455169Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1456103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:28:45.1456338Z return func(*args, **kwargs) 2025-12-04T09:28:45.1458205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1458366Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1460069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1460230Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1461933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1462093Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1463910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:28:45.1464073Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:28:45.1465066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:28:45.1465175Z return func(*args, **kwargs) 2025-12-04T09:28:45.1465632Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1466169Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1467169Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1467681Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1468772Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1469243Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1470132Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1470563Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1471439Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1471865Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1472712Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1473106Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1473961Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1474397Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1475746Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 602865664 and is now 623837184. 2025-12-04T09:28:45.1476074Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1476649Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1477609Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1477930Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1478564Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1479048Z [rank1]:E1204 09:27:58.224000 40216 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:45.1479446Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1479919Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1480801Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1481256Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1482132Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1482508Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1483364Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1483817Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1484669Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1485097Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1485949Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1486343Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1487198Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1487636Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1488984Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 0. CUDA driver allocated memory was 714014720 and is now 732889088. 2025-12-04T09:28:45.1489311Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1489937Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1490843Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1491162Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1491795Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1494509Z [rank0]:E1204 09:27:58.224000 40215 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:45.1494940Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1495411Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1496394Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1497038Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1498028Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1498504Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1499498Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1499985Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1500946Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1501434Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1502414Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1502864Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1503841Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1504332Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1505859Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 2. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T09:28:45.1506265Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1506927Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1507956Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1508315Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1509130Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1509663Z [rank2]:E1204 09:27:58.225000 40217 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:45.1510069Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1510547Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1511436Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1511892Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1512815Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1513194Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1514058Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1514489Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1515351Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1515784Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1516641Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1517041Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1517892Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1518339Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1519717Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 3. CUDA driver allocated memory was 581894144 and is now 623837184. 2025-12-04T09:28:45.1520051Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1520633Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1521942Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1522310Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1523105Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1523652Z [rank3]:E1204 09:27:58.226000 40218 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:45.1523755Z dist init r=2, world=4 2025-12-04T09:28:45.1523861Z dist init r=0, world=4 2025-12-04T09:28:45.1523959Z dist init r=1, world=4 2025-12-04T09:28:45.1524056Z dist init r=3, world=4 2025-12-04T09:28:45.1524165Z FAILED [8.6485s] [100%] 2025-12-04T09:28:45.1524173Z 2025-12-04T09:28:45.1524321Z =================================== FAILURES =================================== 2025-12-04T09:28:45.1524599Z _________________ TestClipGradNormCUDA.test_no_gradients_cuda __________________ 2025-12-04T09:28:45.1524758Z Traceback (most recent call last): 2025-12-04T09:28:45.1525310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:45.1525432Z self._join_processes(fn) 2025-12-04T09:28:45.1526055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:45.1526203Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:45.1526808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:45.1526920Z raise RuntimeError(error) 2025-12-04T09:28:45.1527155Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:45.1527274Z Traceback (most recent call last): 2025-12-04T09:28:45.1527810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1527930Z getattr(self, test_name)() 2025-12-04T09:28:45.1528469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1528568Z fn() 2025-12-04T09:28:45.1529078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1529183Z method(*args, **kwargs) 2025-12-04T09:28:45.1529696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1529801Z method(*args, **kwargs) 2025-12-04T09:28:45.1530308Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1530403Z with policy(): 2025-12-04T09:28:45.1530910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1531026Z raise RuntimeError(msg) 2025-12-04T09:28:45.1532140Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 602865664 and is now 623837184. 2025-12-04T09:28:45.1532150Z 2025-12-04T09:28:45.1532374Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1532948Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1532954Z 2025-12-04T09:28:45.1533217Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1533223Z 2025-12-04T09:28:45.1533228Z 2025-12-04T09:28:45.1533460Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:45.1533805Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:45.1534635Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-74e02afb5846363a.xml - 2025-12-04T09:28:45.1534788Z =========================== short test summary info ============================ 2025-12-04T09:28:45.1535456Z FAILED [8.6485s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:28:45.1535739Z Traceback (most recent call last): 2025-12-04T09:28:45.1536330Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1536446Z getattr(self, test_name)() 2025-12-04T09:28:45.1537134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1537261Z fn() 2025-12-04T09:28:45.1537779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1537885Z method(*args, **kwargs) 2025-12-04T09:28:45.1538421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1538535Z method(*args, **kwargs) 2025-12-04T09:28:45.1539036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1539137Z with policy(): 2025-12-04T09:28:45.1539644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1539753Z raise RuntimeError(msg) 2025-12-04T09:28:45.1540833Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_no_gradients_cuda! Caching allocator allocated memory was 512 and is now reported as 6656 on device 1. CUDA driver allocated memory was 602865664 and is now 623837184. 2025-12-04T09:28:45.1540845Z 2025-12-04T09:28:45.1541059Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1541643Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_no_gradients_cuda 2025-12-04T09:28:45.1541648Z 2025-12-04T09:28:45.1541914Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1542089Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:45.1542280Z ======================= 1 failed, 3 deselected in 8.67s ======================== 2025-12-04T09:28:45.1542374Z Got exit code 1 2025-12-04T09:28:45.1542881Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda 2025-12-04T09:28:45.1543290Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:45.1544035Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-39202840e4782b07.xml 2025-12-04T09:28:45.1544209Z ============================= test session starts ============================== 2025-12-04T09:28:45.1544558Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:45.1544663Z cachedir: .pytest_cache 2025-12-04T09:28:45.1545186Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:45.1545304Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:45.1545416Z configfile: pytest.ini 2025-12-04T09:28:45.1545952Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:45.1546155Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:28:45.1546299Z stepcurrent: skipping 3 already run items. 2025-12-04T09:28:45.1546447Z Running 1 items in this shard 2025-12-04T09:28:45.1546453Z 2025-12-04T09:28:45.1547403Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda I1204 09:28:04.754000 40476 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 40528 2025-12-04T09:28:45.1547907Z I1204 09:28:04.755000 40476 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 40529 2025-12-04T09:28:45.1548393Z I1204 09:28:04.756000 40476 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 40530 2025-12-04T09:28:45.1549094Z I1204 09:28:04.757000 40476 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 40531 2025-12-04T09:28:45.1549806Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1550291Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1551183Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1551661Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1552542Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1552898Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1553761Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1554196Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1555056Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1555483Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1556332Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1556743Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1557625Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1558071Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1559406Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:28:45.1559738Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1560410Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1561302Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1561634Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1562267Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1562784Z [rank1]:E1204 09:28:11.475000 40529 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:45.1563188Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1563665Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1564581Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1565030Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1565916Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1566275Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1567145Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1567581Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1568440Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1568868Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1569719Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1570152Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1571013Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1571452Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1572802Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 489619456 and is now 630128640. 2025-12-04T09:28:45.1573136Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1573721Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1574612Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1574940Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1575576Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1576094Z [rank3]:E1204 09:28:11.475000 40531 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:45.1576578Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1577309Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1578305Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1578812Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1579808Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1580207Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1581182Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1581666Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1582632Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1583120Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1584101Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1584561Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1585520Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1586023Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1587556Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:28:45.1587935Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1588694Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1589754Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1590136Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1590912Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1591433Z [rank2]:E1204 09:28:11.476000 40530 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:45.1591890Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1592385Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1593333Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1593808Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1594749Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1595126Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1596038Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1596491Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1597484Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1597948Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1598798Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1599201Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1600050Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1600496Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1601851Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:28:45.1602187Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1602769Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1603655Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1604017Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1604650Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1605166Z [rank0]:E1204 09:28:11.478000 40528 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:45.1605256Z dist init r=0, world=4 2025-12-04T09:28:45.1605343Z dist init r=1, world=4 2025-12-04T09:28:45.1605441Z dist init r=3, world=4 2025-12-04T09:28:45.1605526Z dist init r=2, world=4 2025-12-04T09:28:45.1606553Z [rank0]:[W1204 09:28:11.492306598 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:28:45.1606649Z FAILED [8.9305s] [100%] 2025-12-04T09:28:45.1606655Z 2025-12-04T09:28:45.1606785Z =================================== FAILURES =================================== 2025-12-04T09:28:45.1607028Z ___________________ TestClipGradNormCUDA.test_non_root_cuda ____________________ 2025-12-04T09:28:45.1607136Z Traceback (most recent call last): 2025-12-04T09:28:45.1607622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:45.1607733Z self._join_processes(fn) 2025-12-04T09:28:45.1608248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:45.1608384Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:45.1608917Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:45.1609020Z raise RuntimeError(error) 2025-12-04T09:28:45.1609241Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:45.1609345Z Traceback (most recent call last): 2025-12-04T09:28:45.1609865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1609976Z getattr(self, test_name)() 2025-12-04T09:28:45.1610448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1610539Z fn() 2025-12-04T09:28:45.1610985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1611079Z method(*args, **kwargs) 2025-12-04T09:28:45.1611534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1611628Z method(*args, **kwargs) 2025-12-04T09:28:45.1612099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1612195Z with policy(): 2025-12-04T09:28:45.1612644Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1612748Z raise RuntimeError(msg) 2025-12-04T09:28:45.1613675Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:28:45.1613681Z 2025-12-04T09:28:45.1613876Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1614371Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1614403Z 2025-12-04T09:28:45.1614640Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1614647Z 2025-12-04T09:28:45.1614804Z Process 1 exited with error code 10 and exception: 2025-12-04T09:28:45.1614937Z Traceback (most recent call last): 2025-12-04T09:28:45.1615421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1615526Z getattr(self, test_name)() 2025-12-04T09:28:45.1616000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1616089Z fn() 2025-12-04T09:28:45.1616610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1616877Z method(*args, **kwargs) 2025-12-04T09:28:45.1617396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1617498Z method(*args, **kwargs) 2025-12-04T09:28:45.1618006Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1618110Z with policy(): 2025-12-04T09:28:45.1618619Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1618725Z raise RuntimeError(msg) 2025-12-04T09:28:45.1619780Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:28:45.1619786Z 2025-12-04T09:28:45.1619999Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1620564Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1620569Z 2025-12-04T09:28:45.1621023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1621096Z 2025-12-04T09:28:45.1621276Z Process 3 exited with error code 10 and exception: 2025-12-04T09:28:45.1621395Z Traceback (most recent call last): 2025-12-04T09:28:45.1621944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1622063Z getattr(self, test_name)() 2025-12-04T09:28:45.1622598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1622687Z fn() 2025-12-04T09:28:45.1623198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1623304Z method(*args, **kwargs) 2025-12-04T09:28:45.1623859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1623964Z method(*args, **kwargs) 2025-12-04T09:28:45.1624467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1624574Z with policy(): 2025-12-04T09:28:45.1625083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1625192Z raise RuntimeError(msg) 2025-12-04T09:28:45.1626247Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 489619456 and is now 630128640. 2025-12-04T09:28:45.1626289Z 2025-12-04T09:28:45.1626508Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1627069Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1627074Z 2025-12-04T09:28:45.1627336Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1627380Z 2025-12-04T09:28:45.1627385Z 2025-12-04T09:28:45.1627607Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:45.1627869Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:45.1628766Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-39202840e4782b07.xml - 2025-12-04T09:28:45.1628945Z =========================== short test summary info ============================ 2025-12-04T09:28:45.1629678Z FAILED [8.9305s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:45.1629802Z Traceback (most recent call last): 2025-12-04T09:28:45.1630359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1630472Z getattr(self, test_name)() 2025-12-04T09:28:45.1631017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1631106Z fn() 2025-12-04T09:28:45.1631621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1631723Z method(*args, **kwargs) 2025-12-04T09:28:45.1632226Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1632341Z method(*args, **kwargs) 2025-12-04T09:28:45.1632944Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1633031Z with policy(): 2025-12-04T09:28:45.1633519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1633619Z raise RuntimeError(msg) 2025-12-04T09:28:45.1634561Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:28:45.1634566Z 2025-12-04T09:28:45.1634756Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1635247Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1635254Z 2025-12-04T09:28:45.1635492Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1635497Z 2025-12-04T09:28:45.1635671Z Process 1 exited with error code 10 and exception: 2025-12-04T09:28:45.1635781Z Traceback (most recent call last): 2025-12-04T09:28:45.1636263Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1636363Z getattr(self, test_name)() 2025-12-04T09:28:45.1636845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1636926Z fn() 2025-12-04T09:28:45.1637379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1637469Z method(*args, **kwargs) 2025-12-04T09:28:45.1637914Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1638042Z method(*args, **kwargs) 2025-12-04T09:28:45.1638487Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1638573Z with policy(): 2025-12-04T09:28:45.1639031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1639155Z raise RuntimeError(msg) 2025-12-04T09:28:45.1640083Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:28:45.1640088Z 2025-12-04T09:28:45.1640279Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1640772Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1640785Z 2025-12-04T09:28:45.1641019Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1641026Z 2025-12-04T09:28:45.1641170Z Process 3 exited with error code 10 and exception: 2025-12-04T09:28:45.1641287Z Traceback (most recent call last): 2025-12-04T09:28:45.1641771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1641871Z getattr(self, test_name)() 2025-12-04T09:28:45.1642351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1642429Z fn() 2025-12-04T09:28:45.1642885Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1642975Z method(*args, **kwargs) 2025-12-04T09:28:45.1643424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1643521Z method(*args, **kwargs) 2025-12-04T09:28:45.1643990Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1644079Z with policy(): 2025-12-04T09:28:45.1644536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1644634Z raise RuntimeError(msg) 2025-12-04T09:28:45.1645568Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 489619456 and is now 630128640. 2025-12-04T09:28:45.1645572Z 2025-12-04T09:28:45.1645761Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1646254Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1646268Z 2025-12-04T09:28:45.1646528Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1646690Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:45.1646851Z ======================= 1 failed, 3 deselected in 8.95s ======================== 2025-12-04T09:28:45.1646935Z Got exit code 1 2025-12-04T09:28:45.1647030Z Retrying single test... 2025-12-04T09:28:45.1647672Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-067163aa862fde85.xml 2025-12-04T09:28:45.1647819Z ============================= test session starts ============================== 2025-12-04T09:28:45.1648138Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:45.1648690Z cachedir: .pytest_cache 2025-12-04T09:28:45.1649151Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:45.1649267Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:45.1649363Z configfile: pytest.ini 2025-12-04T09:28:45.1649879Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:45.1650074Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:28:45.1650642Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda 2025-12-04T09:28:45.1650751Z Running 1 items in this shard 2025-12-04T09:28:45.1650756Z 2025-12-04T09:28:45.1651583Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda I1204 09:28:17.893000 40813 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 40865 2025-12-04T09:28:45.1652033Z I1204 09:28:17.894000 40813 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 40866 2025-12-04T09:28:45.1652482Z I1204 09:28:17.895000 40813 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 40867 2025-12-04T09:28:45.1652919Z I1204 09:28:17.896000 40813 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 40868 2025-12-04T09:28:45.1653332Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1653807Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1654701Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1655161Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1656062Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1656511Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1657625Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1658124Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1659128Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1659623Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1660594Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1661041Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1662017Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1662540Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1664054Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:28:45.1664447Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1665119Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1666129Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1666496Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1667224Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1667772Z [rank1]:E1204 09:28:24.616000 40866 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:45.1668236Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1668870Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1669878Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1670362Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1671245Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1671605Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1672455Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1672902Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1673785Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1674219Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1675072Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1675469Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1676363Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1676802Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1678167Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:28:45.1678487Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1679071Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1679973Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1680296Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1680937Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1681420Z [rank2]:E1204 09:28:24.616000 40867 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:45.1681828Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1682302Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1683211Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1683672Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1684551Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1684906Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1685758Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1686228Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1687090Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1687521Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1688379Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1688801Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1689659Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1690116Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1691448Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 489619456 and is now 630128640. 2025-12-04T09:28:45.1691766Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1692350Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1693250Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1693570Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1694210Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1694691Z [rank3]:E1204 09:28:24.616000 40868 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:45.1695100Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1695592Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1696552Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1697224Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1698211Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1698616Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1699619Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1700112Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1701083Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1701573Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1702550Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1703041Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1704048Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1704540Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1706043Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:28:45.1706409Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1707068Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1708079Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1708440Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1709372Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1709901Z [rank0]:E1204 09:28:24.618000 40865 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:45.1709998Z dist init r=1, world=4 2025-12-04T09:28:45.1710107Z dist init r=0, world=4 2025-12-04T09:28:45.1710227Z dist init r=2, world=4 2025-12-04T09:28:45.1710332Z dist init r=3, world=4 2025-12-04T09:28:45.1711451Z [rank0]:[W1204 09:28:25.634887694 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:28:45.1711545Z FAILED [8.8750s] [100%] 2025-12-04T09:28:45.1711551Z 2025-12-04T09:28:45.1711701Z =================================== FAILURES =================================== 2025-12-04T09:28:45.1711948Z ___________________ TestClipGradNormCUDA.test_non_root_cuda ____________________ 2025-12-04T09:28:45.1712074Z Traceback (most recent call last): 2025-12-04T09:28:45.1712605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:45.1712758Z self._join_processes(fn) 2025-12-04T09:28:45.1713338Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:45.1713475Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:45.1714062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:45.1714182Z raise RuntimeError(error) 2025-12-04T09:28:45.1714407Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:45.1714538Z Traceback (most recent call last): 2025-12-04T09:28:45.1715058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1715193Z getattr(self, test_name)() 2025-12-04T09:28:45.1715725Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1715811Z fn() 2025-12-04T09:28:45.1716311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1716451Z method(*args, **kwargs) 2025-12-04T09:28:45.1716938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1717050Z method(*args, **kwargs) 2025-12-04T09:28:45.1717533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1717628Z with policy(): 2025-12-04T09:28:45.1718133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1718237Z raise RuntimeError(msg) 2025-12-04T09:28:45.1719254Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:28:45.1719273Z 2025-12-04T09:28:45.1719479Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1720015Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1720020Z 2025-12-04T09:28:45.1720288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1720293Z 2025-12-04T09:28:45.1720447Z Process 1 exited with error code 10 and exception: 2025-12-04T09:28:45.1720579Z Traceback (most recent call last): 2025-12-04T09:28:45.1721420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1721535Z getattr(self, test_name)() 2025-12-04T09:28:45.1722173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1722327Z fn() 2025-12-04T09:28:45.1722835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1722947Z method(*args, **kwargs) 2025-12-04T09:28:45.1723447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1723560Z method(*args, **kwargs) 2025-12-04T09:28:45.1724064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1724159Z with policy(): 2025-12-04T09:28:45.1724679Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1724792Z raise RuntimeError(msg) 2025-12-04T09:28:45.1725894Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:28:45.1725903Z 2025-12-04T09:28:45.1726122Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1726675Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1726681Z 2025-12-04T09:28:45.1726951Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1726957Z 2025-12-04T09:28:45.1727123Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:45.1727251Z Traceback (most recent call last): 2025-12-04T09:28:45.1727835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1727953Z getattr(self, test_name)() 2025-12-04T09:28:45.1728500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1728635Z fn() 2025-12-04T09:28:45.1729142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1729254Z method(*args, **kwargs) 2025-12-04T09:28:45.1729758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1729860Z method(*args, **kwargs) 2025-12-04T09:28:45.1730372Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1730471Z with policy(): 2025-12-04T09:28:45.1730988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1731097Z raise RuntimeError(msg) 2025-12-04T09:28:45.1732145Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:28:45.1732162Z 2025-12-04T09:28:45.1732375Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1733030Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1733035Z 2025-12-04T09:28:45.1733298Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1733303Z 2025-12-04T09:28:45.1733307Z 2025-12-04T09:28:45.1733520Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:45.1733881Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:45.1734751Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-067163aa862fde85.xml - 2025-12-04T09:28:45.1734910Z =========================== short test summary info ============================ 2025-12-04T09:28:45.1735600Z FAILED [8.8750s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:45.1735713Z Traceback (most recent call last): 2025-12-04T09:28:45.1736230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1736396Z getattr(self, test_name)() 2025-12-04T09:28:45.1737086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1737181Z fn() 2025-12-04T09:28:45.1737721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1737829Z method(*args, **kwargs) 2025-12-04T09:28:45.1738343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1738448Z method(*args, **kwargs) 2025-12-04T09:28:45.1738958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1739054Z with policy(): 2025-12-04T09:28:45.1739563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1739677Z raise RuntimeError(msg) 2025-12-04T09:28:45.1740750Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:28:45.1740757Z 2025-12-04T09:28:45.1740979Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1741562Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1741567Z 2025-12-04T09:28:45.1741831Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1741836Z 2025-12-04T09:28:45.1742004Z Process 1 exited with error code 10 and exception: 2025-12-04T09:28:45.1742127Z Traceback (most recent call last): 2025-12-04T09:28:45.1742680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1742791Z getattr(self, test_name)() 2025-12-04T09:28:45.1743327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1743423Z fn() 2025-12-04T09:28:45.1743928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1744035Z method(*args, **kwargs) 2025-12-04T09:28:45.1744546Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1744649Z method(*args, **kwargs) 2025-12-04T09:28:45.1745163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1745257Z with policy(): 2025-12-04T09:28:45.1745766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1745882Z raise RuntimeError(msg) 2025-12-04T09:28:45.1746955Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:28:45.1746963Z 2025-12-04T09:28:45.1747183Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1747734Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1747739Z 2025-12-04T09:28:45.1748002Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1748007Z 2025-12-04T09:28:45.1748176Z Process 2 exited with error code 10 and exception: 2025-12-04T09:28:45.1748292Z Traceback (most recent call last): 2025-12-04T09:28:45.1748945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1749042Z getattr(self, test_name)() 2025-12-04T09:28:45.1749555Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1749643Z fn() 2025-12-04T09:28:45.1750092Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1750183Z method(*args, **kwargs) 2025-12-04T09:28:45.1750640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1750729Z method(*args, **kwargs) 2025-12-04T09:28:45.1751179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1751263Z with policy(): 2025-12-04T09:28:45.1751707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1751834Z raise RuntimeError(msg) 2025-12-04T09:28:45.1752758Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 602865664 and is now 630128640. 2025-12-04T09:28:45.1752789Z 2025-12-04T09:28:45.1752989Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1753474Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1753479Z 2025-12-04T09:28:45.1753709Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1753869Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:45.1754023Z ======================= 1 failed, 3 deselected in 8.90s ======================== 2025-12-04T09:28:45.1754118Z Got exit code 1 2025-12-04T09:28:45.1754213Z Retrying single test... 2025-12-04T09:28:45.1754855Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-adf2403f35f3c235.xml 2025-12-04T09:28:45.1755011Z ============================= test session starts ============================== 2025-12-04T09:28:45.1755316Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:45.1755408Z cachedir: .pytest_cache 2025-12-04T09:28:45.1755871Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:45.1755976Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:45.1756076Z configfile: pytest.ini 2025-12-04T09:28:45.1756544Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:45.1756725Z collecting ... collected 4 items / 3 deselected / 1 selected 2025-12-04T09:28:45.1757297Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda 2025-12-04T09:28:45.1757478Z Running 1 items in this shard 2025-12-04T09:28:45.1757486Z 2025-12-04T09:28:45.1758317Z distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda I1204 09:28:31.074000 41150 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 41202 2025-12-04T09:28:45.1758754Z I1204 09:28:31.075000 41150 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 41203 2025-12-04T09:28:45.1759190Z I1204 09:28:31.076000 41150 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 41204 2025-12-04T09:28:45.1759632Z I1204 09:28:31.077000 41150 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 41205 2025-12-04T09:28:45.1760059Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1760540Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1761433Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1761884Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1762771Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1763147Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1764005Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1764460Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1765313Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1765740Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1766588Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1766991Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1767841Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1768278Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1769599Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 2. CUDA driver allocated memory was 604962816 and is now 630128640. 2025-12-04T09:28:45.1769928Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1770539Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1771434Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1771755Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1772383Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1772895Z [rank2]:E1204 09:28:37.808000 41204 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:28:45.1773298Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1773773Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1774659Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1775106Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1776005Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1776444Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1777593Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1778075Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1779034Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1779517Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1780476Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1780928Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1781890Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1782383Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1783911Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:28:45.1784279Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1784930Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1785933Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1786298Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1787033Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1787585Z [rank1]:E1204 09:28:37.809000 41203 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:28:45.1788033Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1788567Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1789590Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1790062Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1790944Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1791318Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1792174Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1792602Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1793460Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1793895Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1794744Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1795142Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1795991Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1796433Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1797794Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 487522304 and is now 630128640. 2025-12-04T09:28:45.1798122Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1798703Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1799585Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1799914Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1800577Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1801069Z [rank3]:E1204 09:28:37.809000 41205 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:28:45.1801467Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:28:45.1801935Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:28:45.1802823Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1803303Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:28:45.1804188Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1804564Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:28:45.1805411Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1805838Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1806687Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1807122Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:28:45.1807969Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1808364Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:28:45.1809214Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1809656Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:28:45.1811014Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:28:45.1811339Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1811925Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1812835Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1813156Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:28:45.1813792Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1814271Z [rank0]:E1204 09:28:37.811000 41202 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:28:45.1814362Z dist init r=3, world=4 2025-12-04T09:28:45.1814448Z dist init r=1, world=4 2025-12-04T09:28:45.1814534Z dist init r=0, world=4 2025-12-04T09:28:45.1814626Z dist init r=2, world=4 2025-12-04T09:28:45.1815653Z [rank0]:[W1204 09:28:38.818750524 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:28:45.1815775Z FAILED [8.9108s] [100%] 2025-12-04T09:28:45.1815780Z 2025-12-04T09:28:45.1815912Z =================================== FAILURES =================================== 2025-12-04T09:28:45.1816166Z ___________________ TestClipGradNormCUDA.test_non_root_cuda ____________________ 2025-12-04T09:28:45.1816345Z Traceback (most recent call last): 2025-12-04T09:28:45.1817024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:28:45.1817136Z self._join_processes(fn) 2025-12-04T09:28:45.1817726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:28:45.1817865Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:28:45.1818478Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:28:45.1818593Z raise RuntimeError(error) 2025-12-04T09:28:45.1818826Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:45.1818954Z Traceback (most recent call last): 2025-12-04T09:28:45.1819494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1819607Z getattr(self, test_name)() 2025-12-04T09:28:45.1820137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1820224Z fn() 2025-12-04T09:28:45.1820907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1821019Z method(*args, **kwargs) 2025-12-04T09:28:45.1821527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1821631Z method(*args, **kwargs) 2025-12-04T09:28:45.1822199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1822304Z with policy(): 2025-12-04T09:28:45.1822814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1822923Z raise RuntimeError(msg) 2025-12-04T09:28:45.1823979Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:28:45.1823986Z 2025-12-04T09:28:45.1824200Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1824764Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1824770Z 2025-12-04T09:28:45.1825080Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1825089Z 2025-12-04T09:28:45.1825251Z Process 1 exited with error code 10 and exception: 2025-12-04T09:28:45.1825381Z Traceback (most recent call last): 2025-12-04T09:28:45.1825922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1826037Z getattr(self, test_name)() 2025-12-04T09:28:45.1826570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1826654Z fn() 2025-12-04T09:28:45.1827164Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1827307Z method(*args, **kwargs) 2025-12-04T09:28:45.1827812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1827918Z method(*args, **kwargs) 2025-12-04T09:28:45.1828421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1828560Z with policy(): 2025-12-04T09:28:45.1829065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1829169Z raise RuntimeError(msg) 2025-12-04T09:28:45.1830216Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:28:45.1830224Z 2025-12-04T09:28:45.1830434Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1830990Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1830996Z 2025-12-04T09:28:45.1831260Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1831267Z 2025-12-04T09:28:45.1831425Z Process 3 exited with error code 10 and exception: 2025-12-04T09:28:45.1831547Z Traceback (most recent call last): 2025-12-04T09:28:45.1832090Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1832202Z getattr(self, test_name)() 2025-12-04T09:28:45.1832735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1832927Z fn() 2025-12-04T09:28:45.1833422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1833519Z method(*args, **kwargs) 2025-12-04T09:28:45.1834032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1834137Z method(*args, **kwargs) 2025-12-04T09:28:45.1834622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1834719Z with policy(): 2025-12-04T09:28:45.1835208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1835310Z raise RuntimeError(msg) 2025-12-04T09:28:45.1836329Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 487522304 and is now 630128640. 2025-12-04T09:28:45.1836336Z 2025-12-04T09:28:45.1836540Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1837107Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1837115Z 2025-12-04T09:28:45.1837371Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1837376Z 2025-12-04T09:28:45.1837381Z 2025-12-04T09:28:45.1837590Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:28:45.1837848Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:28:45.1838714Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-adf2403f35f3c235.xml - 2025-12-04T09:28:45.1838881Z =========================== short test summary info ============================ 2025-12-04T09:28:45.1839617Z FAILED [8.9108s] distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:28:45.1839731Z Traceback (most recent call last): 2025-12-04T09:28:45.1840266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1840413Z getattr(self, test_name)() 2025-12-04T09:28:45.1840933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1841016Z fn() 2025-12-04T09:28:45.1841501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1841603Z method(*args, **kwargs) 2025-12-04T09:28:45.1842091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1842190Z method(*args, **kwargs) 2025-12-04T09:28:45.1842688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1842784Z with policy(): 2025-12-04T09:28:45.1843375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1843475Z raise RuntimeError(msg) 2025-12-04T09:28:45.1844462Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 0. CUDA driver allocated memory was 714014720 and is now 739180544. 2025-12-04T09:28:45.1844475Z 2025-12-04T09:28:45.1844675Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1845192Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1845199Z 2025-12-04T09:28:45.1845450Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1845455Z 2025-12-04T09:28:45.1845631Z Process 1 exited with error code 10 and exception: 2025-12-04T09:28:45.1845745Z Traceback (most recent call last): 2025-12-04T09:28:45.1846260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1846361Z getattr(self, test_name)() 2025-12-04T09:28:45.1846870Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1846950Z fn() 2025-12-04T09:28:45.1847423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1847523Z method(*args, **kwargs) 2025-12-04T09:28:45.1847994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1848093Z method(*args, **kwargs) 2025-12-04T09:28:45.1848594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1848686Z with policy(): 2025-12-04T09:28:45.1849171Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1849272Z raise RuntimeError(msg) 2025-12-04T09:28:45.1850248Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 1. CUDA driver allocated memory was 607059968 and is now 630128640. 2025-12-04T09:28:45.1850260Z 2025-12-04T09:28:45.1850462Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1851007Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1851013Z 2025-12-04T09:28:45.1851266Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1851273Z 2025-12-04T09:28:45.1851449Z Process 3 exited with error code 10 and exception: 2025-12-04T09:28:45.1851558Z Traceback (most recent call last): 2025-12-04T09:28:45.1852076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:28:45.1852176Z getattr(self, test_name)() 2025-12-04T09:28:45.1852683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:28:45.1852762Z fn() 2025-12-04T09:28:45.1853234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1853340Z method(*args, **kwargs) 2025-12-04T09:28:45.1853810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:28:45.1853915Z method(*args, **kwargs) 2025-12-04T09:28:45.1854385Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:28:45.1854477Z with policy(): 2025-12-04T09:28:45.1854962Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:28:45.1855061Z raise RuntimeError(msg) 2025-12-04T09:28:45.1856037Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestClipGradNormCUDA.test_non_root_cuda! Caching allocator allocated memory was 512 and is now reported as 2560 on device 3. CUDA driver allocated memory was 487522304 and is now 630128640. 2025-12-04T09:28:45.1856052Z 2025-12-04T09:28:45.1856321Z To execute this test, run the following from the base repo dir: 2025-12-04T09:28:45.1857022Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_clip_grad_norm.py TestClipGradNormCUDA.test_non_root_cuda 2025-12-04T09:28:45.1857028Z 2025-12-04T09:28:45.1857333Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:28:45.1857514Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:28:45.1857690Z ======================= 1 failed, 3 deselected in 8.93s ======================== 2025-12-04T09:28:45.1857797Z Got exit code 1 2025-12-04T09:28:45.1858271Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda 2025-12-04T09:28:45.1858682Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:28:45.1859396Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-36b91fd354097cab.xml 2025-12-04T09:28:45.1859556Z ============================= test session starts ============================== 2025-12-04T09:28:45.1859936Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:28:45.1860046Z cachedir: .pytest_cache 2025-12-04T09:28:45.1860564Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:28:45.1860683Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:28:45.1860788Z configfile: pytest.ini 2025-12-04T09:28:45.1861333Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:28:45.1861537Z collecting ... collected 4 items / 4 deselected / 0 selected 2025-12-04T09:28:45.1861675Z stepcurrent: skipping 4 already run items. 2025-12-04T09:28:45.1861819Z Running 0 items in this shard 2025-12-04T09:28:45.1861825Z 2025-12-04T09:28:45.1862722Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-36b91fd354097cab.xml - 2025-12-04T09:28:45.1862891Z ============================ 4 deselected in 0.01s ============================= 2025-12-04T09:28:45.1864946Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_ddp_parity_cuda', 'test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_low_precision_grads_cuda', 'test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_no_gradients_cuda', 'test/distributed/fsdp/test_fsdp_clip_grad_norm.py::TestClipGradNormCUDA::test_non_root_cuda'] 2025-12-04T09:28:45.1864953Z 2025-12-04T09:28:45.1865643Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_clip_grad_norm 1/1 (test/test-reports/distributed.fsdp.test_fsdp_clip_grad_norm_1.1_4959fae61140b3a8_.log) 2025-12-04T09:28:45.1865651Z 2025-12-04T09:28:45.1866073Z Finished distributed/fsdp/test_fsdp_clip_grad_norm 1/1 ... [2025-12-04 09:28:44.964469][2156.572384067], took 3.40min 2025-12-04T09:28:45.1867025Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a06a4188d644524d.xml 2025-12-04T09:28:45.1867987Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-03186403898f3bbb.xml 2025-12-04T09:28:45.1869113Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a3dc994784795bc1.xml 2025-12-04T09:28:45.1869946Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-b1d6139c1033a518.xml 2025-12-04T09:28:45.1870803Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-ebdc3db326996caa.xml 2025-12-04T09:28:45.1871641Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-c42bc725a7562377.xml 2025-12-04T09:28:45.2060962Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-4818210284e31d5e.xml 2025-12-04T09:28:45.2366091Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-1b5186457c75b3fb.xml 2025-12-04T09:28:45.2644813Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-74e02afb5846363a.xml 2025-12-04T09:28:45.2932739Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-39202840e4782b07.xml 2025-12-04T09:28:45.3243529Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-067163aa862fde85.xml 2025-12-04T09:28:45.3554153Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-adf2403f35f3c235.xml 2025-12-04T09:28:45.3843133Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-36b91fd354097cab.xml 2025-12-04T09:28:45.7098722Z Uploading logs for 57116084904 to S3 2025-12-04T09:28:45.7669412Z Uploading artifacts took 0.36 seconds 2025-12-04T09:28:45.7669911Z distributed/fsdp/test_fsdp_clip_grad_norm 1/1 failed! 2025-12-04T09:28:45.7675845Z Running distributed/fsdp/test_fsdp_core 2/2 ... [2025-12-04 09:28:45.766985][2157.374901957] 2025-12-04T09:28:45.7676435Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:28:45.7677689Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_core.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:28:45.767316] 2025-12-04T09:59:12.9994212Z 2025-12-04T09:59:12.9995161Z PRINTING LOG FILE of distributed/fsdp/test_fsdp_core 2/2 (test/test-reports/distributed.fsdp.test_fsdp_core_2.2_6137898c6891d430_.log) 2025-12-04T09:59:12.9996538Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90a070d9a0caeaa7.xml 2025-12-04T09:59:12.9997499Z ============================= test session starts ============================== 2025-12-04T09:59:12.9998170Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:12.9998772Z cachedir: .pytest_cache 2025-12-04T09:59:12.9999479Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.0000259Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.0000604Z configfile: pytest.ini 2025-12-04T09:59:13.0001331Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.0002126Z collecting ... collected 60 items 2025-12-04T09:59:13.0002555Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T09:59:13.0019650Z Running 27 items in this shard: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda, test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda, test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda, test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda, test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda, test/distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda 2025-12-04T09:59:13.0036710Z 2025-12-04T09:59:13.0037775Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda I1204 09:28:49.174000 41544 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 41596 2025-12-04T09:59:13.0039471Z I1204 09:28:49.175000 41544 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 41597 2025-12-04T09:59:13.0040619Z I1204 09:28:49.176000 41544 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 41598 2025-12-04T09:59:13.0041750Z I1204 09:28:49.176000 41544 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 41599 2025-12-04T09:59:13.0043740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0045275Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0047260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0049289Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0050843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0052414Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0053916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0055428Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0057496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0059580Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0061614Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0063689Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0065235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0066752Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0068728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0070760Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0075608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.0080654Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.0085725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.0090739Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.0095804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.0101171Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.0106227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.0111250Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.0112264Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.0113440Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.0115148Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0116828Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.0118491Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0120048Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.0122255Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0123890Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0125511Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0127124Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0128739Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0130346Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.0131956Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0133686Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.0139159Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 611254272 and is now 634322944. 2025-12-04T09:59:13.0141362Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0142558Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0144498Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0146134Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0147374Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0148805Z [rank3]:E1204 09:28:56.158000 41599 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.0150045Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.0151184Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.0152884Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0154553Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.0156214Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0157787Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.0159305Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0160917Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0162524Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0164167Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0165766Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0167459Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.0168975Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0170535Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.0172784Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T09:59:13.0174862Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0176006Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0178192Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0179823Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0181064Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0182504Z [rank1]:E1204 09:28:56.158000 41597 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.0183661Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.0184801Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.0186492Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0188149Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.0189930Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0191426Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.0192896Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0194451Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0196028Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0197586Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0199178Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0200683Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.0202195Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0203743Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.0205965Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T09:59:13.0208059Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0209205Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0211069Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0212636Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0213862Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0215228Z [rank2]:E1204 09:28:56.158000 41598 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.0216429Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.0217753Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.0219435Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0221385Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.0223064Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0224607Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.0226121Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0227770Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0229374Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0231010Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0232609Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0234189Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.0235665Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0237192Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.0239362Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 718209024 and is now 743374848. 2025-12-04T09:59:13.0241459Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0242556Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0244419Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0246245Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0247437Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0248811Z [rank0]:E1204 09:28:56.159000 41596 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.0249637Z dist init r=0, world=4 2025-12-04T09:59:13.0249933Z dist init r=1, world=4 2025-12-04T09:59:13.0250216Z dist init r=2, world=4 2025-12-04T09:59:13.0250490Z dist init r=3, world=4 2025-12-04T09:59:13.0251844Z [rank0]:[W1204 09:28:56.178164705 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.0253206Z FAILED [9.2706s] [ 3%] 2025-12-04T09:59:13.0253394Z 2025-12-04T09:59:13.0253545Z =================================== FAILURES =================================== 2025-12-04T09:59:13.0254126Z ___ TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda ____ 2025-12-04T09:59:13.0254666Z Traceback (most recent call last): 2025-12-04T09:59:13.0255442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.0256227Z self._join_processes(fn) 2025-12-04T09:59:13.0257104Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.0258192Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.0259089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.0259989Z raise RuntimeError(error) 2025-12-04T09:59:13.0260445Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.0260928Z Traceback (most recent call last): 2025-12-04T09:59:13.0261716Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0262517Z getattr(self, test_name)() 2025-12-04T09:59:13.0263264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0264042Z fn() 2025-12-04T09:59:13.0264702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0265466Z method(*args, **kwargs) 2025-12-04T09:59:13.0266179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0266945Z method(*args, **kwargs) 2025-12-04T09:59:13.0267662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0268547Z with policy(): 2025-12-04T09:59:13.0269219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0270066Z raise RuntimeError(msg) 2025-12-04T09:59:13.0271399Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T09:59:13.0272648Z 2025-12-04T09:59:13.0272855Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0273863Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0274636Z 2025-12-04T09:59:13.0274891Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0275270Z 2025-12-04T09:59:13.0275275Z 2025-12-04T09:59:13.0275505Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.0276090Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.0277411Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90a070d9a0caeaa7.xml - 2025-12-04T09:59:13.0278504Z =========================== short test summary info ============================ 2025-12-04T09:59:13.0279661Z FAILED [9.2706s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.0280721Z Traceback (most recent call last): 2025-12-04T09:59:13.0281483Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0282264Z getattr(self, test_name)() 2025-12-04T09:59:13.0282999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0283738Z fn() 2025-12-04T09:59:13.0284370Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0285112Z method(*args, **kwargs) 2025-12-04T09:59:13.0285842Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0286566Z method(*args, **kwargs) 2025-12-04T09:59:13.0287375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0288317Z with policy(): 2025-12-04T09:59:13.0289070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0289798Z raise RuntimeError(msg) 2025-12-04T09:59:13.0291125Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T09:59:13.0292371Z 2025-12-04T09:59:13.0292590Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0293554Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0294325Z 2025-12-04T09:59:13.0294581Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0295144Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.0295598Z ============================== 1 failed in 9.49s =============================== 2025-12-04T09:59:13.0295963Z Got exit code 1 2025-12-04T09:59:13.0296221Z Retrying single test... 2025-12-04T09:59:13.0297264Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b56b818e7dab969.xml 2025-12-04T09:59:13.0298196Z ============================= test session starts ============================== 2025-12-04T09:59:13.0298845Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.0299449Z cachedir: .pytest_cache 2025-12-04T09:59:13.0300161Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.0300931Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.0301330Z configfile: pytest.ini 2025-12-04T09:59:13.0302062Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.0302956Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.0304052Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0305047Z Running 1 items in this shard 2025-12-04T09:59:13.0305261Z 2025-12-04T09:59:13.0306308Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda I1204 09:29:03.213000 41881 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 41933 2025-12-04T09:59:13.0308012Z I1204 09:29:03.214000 41881 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 41934 2025-12-04T09:59:13.0309231Z I1204 09:29:03.215000 41881 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 41935 2025-12-04T09:59:13.0310302Z I1204 09:29:03.216000 41881 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 41936 2025-12-04T09:59:13.0312080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0313500Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0315407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0317323Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0318773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0320181Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0322044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0323573Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0325541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0327564Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0329591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0331612Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0333222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0334697Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0336538Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0338721Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0343583Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.0348633Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.0353244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.0357716Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.0362221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.0366667Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.0371166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.0375607Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.0376550Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.0377863Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.0379556Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0381260Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.0382896Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0384473Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.0385986Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0387589Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0389384Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0390798Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0392458Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0393936Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.0395411Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0396938Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.0399318Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 718209024 and is now 743374848. 2025-12-04T09:59:13.0401419Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0402568Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0404474Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0406070Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0407257Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0408629Z [rank0]:E1204 09:29:10.179000 41933 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.0409753Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.0410858Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.0412517Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0414270Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.0415835Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0417583Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.0419099Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0420694Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0422541Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0424153Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0425757Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0427328Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.0428954Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0430571Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.0432978Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 611254272 and is now 634322944. 2025-12-04T09:59:13.0435134Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0436286Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0438090Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0439621Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0440784Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0442106Z [rank1]:E1204 09:29:10.180000 41934 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.0443230Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.0444368Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.0445869Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0447388Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.0448859Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0450216Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.0451570Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0452996Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0454422Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0455841Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0457580Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0459195Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.0460767Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0462376Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.0464669Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T09:59:13.0466839Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0468018Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0470054Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0471512Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0472608Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0473876Z [rank3]:E1204 09:29:10.181000 41936 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.0474895Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.0475941Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.0477436Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0478896Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.0480367Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0481737Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.0483083Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0484505Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0485917Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0487345Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0488792Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0490176Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.0491567Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0492981Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.0495039Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T09:59:13.0497239Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0498424Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0500353Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0501965Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0503246Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0504663Z [rank2]:E1204 09:29:10.181000 41935 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.0505490Z dist init r=2, world=4 2025-12-04T09:59:13.0505766Z dist init r=0, world=4 2025-12-04T09:59:13.0506053Z dist init r=3, world=4 2025-12-04T09:59:13.0506332Z dist init r=1, world=4 2025-12-04T09:59:13.0507676Z [rank0]:[W1204 09:29:10.199158514 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.0509151Z FAILED [9.5626s] [100%] 2025-12-04T09:59:13.0509327Z 2025-12-04T09:59:13.0509467Z =================================== FAILURES =================================== 2025-12-04T09:59:13.0510003Z ___ TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda ____ 2025-12-04T09:59:13.0510503Z Traceback (most recent call last): 2025-12-04T09:59:13.0511212Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.0511930Z self._join_processes(fn) 2025-12-04T09:59:13.0512647Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.0513414Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.0514249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.0515130Z raise RuntimeError(error) 2025-12-04T09:59:13.0515694Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.0516400Z Traceback (most recent call last): 2025-12-04T09:59:13.0517166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0518032Z getattr(self, test_name)() 2025-12-04T09:59:13.0518865Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0568362Z fn() 2025-12-04T09:59:13.0569017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0569718Z method(*args, **kwargs) 2025-12-04T09:59:13.0570358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0571047Z method(*args, **kwargs) 2025-12-04T09:59:13.0571691Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0572378Z with policy(): 2025-12-04T09:59:13.0573101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0573805Z raise RuntimeError(msg) 2025-12-04T09:59:13.0575071Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 718209024 and is now 743374848. 2025-12-04T09:59:13.0576266Z 2025-12-04T09:59:13.0576584Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0577766Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0578587Z 2025-12-04T09:59:13.0578935Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0579339Z 2025-12-04T09:59:13.0579522Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.0579959Z Traceback (most recent call last): 2025-12-04T09:59:13.0580751Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0581617Z getattr(self, test_name)() 2025-12-04T09:59:13.0582386Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0583151Z fn() 2025-12-04T09:59:13.0583807Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0584577Z method(*args, **kwargs) 2025-12-04T09:59:13.0585297Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0586047Z method(*args, **kwargs) 2025-12-04T09:59:13.0586765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0587526Z with policy(): 2025-12-04T09:59:13.0588201Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0589079Z raise RuntimeError(msg) 2025-12-04T09:59:13.0590331Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T09:59:13.0591517Z 2025-12-04T09:59:13.0591725Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0592627Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0593355Z 2025-12-04T09:59:13.0593593Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0593964Z 2025-12-04T09:59:13.0593968Z 2025-12-04T09:59:13.0594219Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.0594790Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.0595870Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b56b818e7dab969.xml - 2025-12-04T09:59:13.0596850Z =========================== short test summary info ============================ 2025-12-04T09:59:13.0597880Z FAILED [9.5626s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.0598852Z Traceback (most recent call last): 2025-12-04T09:59:13.0599568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0600300Z getattr(self, test_name)() 2025-12-04T09:59:13.0600987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0601688Z fn() 2025-12-04T09:59:13.0602262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0602949Z method(*args, **kwargs) 2025-12-04T09:59:13.0603594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0604277Z method(*args, **kwargs) 2025-12-04T09:59:13.0604900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0605623Z with policy(): 2025-12-04T09:59:13.0606240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0606917Z raise RuntimeError(msg) 2025-12-04T09:59:13.0608187Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 718209024 and is now 743374848. 2025-12-04T09:59:13.0609409Z 2025-12-04T09:59:13.0609606Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0610518Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0611229Z 2025-12-04T09:59:13.0611486Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0611842Z 2025-12-04T09:59:13.0611991Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.0612374Z Traceback (most recent call last): 2025-12-04T09:59:13.0613091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0613801Z getattr(self, test_name)() 2025-12-04T09:59:13.0614475Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0615170Z fn() 2025-12-04T09:59:13.0615757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0616514Z method(*args, **kwargs) 2025-12-04T09:59:13.0617389Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0618154Z method(*args, **kwargs) 2025-12-04T09:59:13.0618861Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0619618Z with policy(): 2025-12-04T09:59:13.0620358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0621411Z raise RuntimeError(msg) 2025-12-04T09:59:13.0622810Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T09:59:13.0624151Z 2025-12-04T09:59:13.0624369Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0625398Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0626207Z 2025-12-04T09:59:13.0626487Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0627148Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.0627660Z ======================= 1 failed, 26 deselected in 9.78s ======================= 2025-12-04T09:59:13.0628088Z Got exit code 1 2025-12-04T09:59:13.0628362Z Retrying single test... 2025-12-04T09:59:13.0629171Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2da5f79ab7711605.xml 2025-12-04T09:59:13.0630106Z ============================= test session starts ============================== 2025-12-04T09:59:13.0630776Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.0631366Z cachedir: .pytest_cache 2025-12-04T09:59:13.0632074Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.0633012Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.0633336Z configfile: pytest.ini 2025-12-04T09:59:13.0633978Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.0634825Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.0635815Z stepcurrent: skipping 0 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0636706Z Running 1 items in this shard 2025-12-04T09:59:13.0636896Z 2025-12-04T09:59:13.0637817Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda I1204 09:29:17.143000 42218 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 42270 2025-12-04T09:59:13.0639309Z I1204 09:29:17.144000 42218 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 42271 2025-12-04T09:59:13.0640332Z I1204 09:29:17.145000 42218 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 42272 2025-12-04T09:59:13.0641346Z I1204 09:29:17.146000 42218 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 42273 2025-12-04T09:59:13.0643020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0644358Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0646113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0647907Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0649327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0650661Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0652421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0654211Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0655617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0657237Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0658705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0660184Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0662137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0664191Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0666225Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0668227Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0672774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.0677212Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.0681715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.0686145Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.0690595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.0695020Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.0700086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.0705123Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.0706101Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.0707221Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.0709014Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0710599Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.0712086Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0713429Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.0714755Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0716162Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0717567Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0718992Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0720394Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0722179Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.0723728Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0725390Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.0727679Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 709820416 and is now 743374848. 2025-12-04T09:59:13.0729867Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0731021Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0732931Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0734537Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0735625Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0737147Z [rank0]:E1204 09:29:24.133000 42270 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.0738291Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.0739416Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.0741089Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0742782Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.0744406Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0745934Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.0747429Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0749119Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0750568Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0751972Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0753377Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0754738Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.0756107Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0757547Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.0759598Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 602865664 and is now 634322944. 2025-12-04T09:59:13.0761495Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0762525Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0764219Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0765647Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0766712Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0767946Z [rank1]:E1204 09:29:24.133000 42271 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.0768956Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.0769947Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.0771450Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0772912Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.0774363Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0775703Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.0777357Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0779003Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0780591Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0782183Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0783766Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0785350Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.0786905Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0788529Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.0790715Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T09:59:13.0792611Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0793645Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0795332Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0796756Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0797832Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0799069Z [rank3]:E1204 09:29:24.134000 42273 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.0800068Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.0801092Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.0802580Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0804028Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.0805486Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0806824Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.0808183Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0809594Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0810991Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0812396Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0813864Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0815244Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.0816882Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0818476Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.0820933Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T09:59:13.0823149Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0824327Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0826257Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0827876Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0829089Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0830502Z [rank2]:E1204 09:29:24.135000 42272 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.0831297Z dist init r=2, world=4 2025-12-04T09:59:13.0831637Z dist init r=1, world=4 2025-12-04T09:59:13.0831917Z dist init r=0, world=4 2025-12-04T09:59:13.0832190Z dist init r=3, world=4 2025-12-04T09:59:13.0833599Z [rank0]:[W1204 09:29:24.260352374 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.0835082Z FAILED [9.3996s] [100%] 2025-12-04T09:59:13.0835261Z 2025-12-04T09:59:13.0835407Z =================================== FAILURES =================================== 2025-12-04T09:59:13.0835966Z ___ TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda ____ 2025-12-04T09:59:13.0836508Z Traceback (most recent call last): 2025-12-04T09:59:13.0837304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.0838073Z self._join_processes(fn) 2025-12-04T09:59:13.0838847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.0839686Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.0840535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.0841478Z raise RuntimeError(error) 2025-12-04T09:59:13.0841897Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.0842346Z Traceback (most recent call last): 2025-12-04T09:59:13.0843079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0843865Z getattr(self, test_name)() 2025-12-04T09:59:13.0844561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0845283Z fn() 2025-12-04T09:59:13.0845930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0846647Z method(*args, **kwargs) 2025-12-04T09:59:13.0847304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0848015Z method(*args, **kwargs) 2025-12-04T09:59:13.0848680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0849372Z with policy(): 2025-12-04T09:59:13.0850009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0850734Z raise RuntimeError(msg) 2025-12-04T09:59:13.0852049Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T09:59:13.0853293Z 2025-12-04T09:59:13.0853502Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0854451Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0855217Z 2025-12-04T09:59:13.0855464Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0855841Z 2025-12-04T09:59:13.0856002Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.0856469Z Traceback (most recent call last): 2025-12-04T09:59:13.0857421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0858215Z getattr(self, test_name)() 2025-12-04T09:59:13.0859003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0859764Z fn() 2025-12-04T09:59:13.0860406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0861161Z method(*args, **kwargs) 2025-12-04T09:59:13.0861863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0862605Z method(*args, **kwargs) 2025-12-04T09:59:13.0863311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0864064Z with policy(): 2025-12-04T09:59:13.0864758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0865518Z raise RuntimeError(msg) 2025-12-04T09:59:13.0866922Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T09:59:13.0868243Z 2025-12-04T09:59:13.0868462Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0869626Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0870342Z 2025-12-04T09:59:13.0870576Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0870974Z 2025-12-04T09:59:13.0870978Z 2025-12-04T09:59:13.0871176Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.0871728Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.0872778Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2da5f79ab7711605.xml - 2025-12-04T09:59:13.0873779Z =========================== short test summary info ============================ 2025-12-04T09:59:13.0874784Z FAILED [9.3996s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.0875732Z Traceback (most recent call last): 2025-12-04T09:59:13.0876422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0877136Z getattr(self, test_name)() 2025-12-04T09:59:13.0877803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0878484Z fn() 2025-12-04T09:59:13.0879050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0879729Z method(*args, **kwargs) 2025-12-04T09:59:13.0880353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0881021Z method(*args, **kwargs) 2025-12-04T09:59:13.0881639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0882306Z with policy(): 2025-12-04T09:59:13.0882908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0883576Z raise RuntimeError(msg) 2025-12-04T09:59:13.0884855Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T09:59:13.0886034Z 2025-12-04T09:59:13.0886224Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0887116Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0887827Z 2025-12-04T09:59:13.0888065Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0888417Z 2025-12-04T09:59:13.0888559Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.0888929Z Traceback (most recent call last): 2025-12-04T09:59:13.0889625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0890349Z getattr(self, test_name)() 2025-12-04T09:59:13.0891020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0891701Z fn() 2025-12-04T09:59:13.0892275Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0892932Z method(*args, **kwargs) 2025-12-04T09:59:13.0893564Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0894230Z method(*args, **kwargs) 2025-12-04T09:59:13.0894848Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0895545Z with policy(): 2025-12-04T09:59:13.0896150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0897109Z raise RuntimeError(msg) 2025-12-04T09:59:13.0898501Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T09:59:13.0899876Z 2025-12-04T09:59:13.0900092Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0901105Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0901905Z 2025-12-04T09:59:13.0902179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0902755Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.0903251Z ======================= 1 failed, 26 deselected in 9.62s ======================= 2025-12-04T09:59:13.0903668Z Got exit code 1 2025-12-04T09:59:13.0904413Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda 2025-12-04T09:59:13.0905523Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.0906696Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a202ac92fafcf85d.xml 2025-12-04T09:59:13.0907616Z ============================= test session starts ============================== 2025-12-04T09:59:13.0908264Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.0908847Z cachedir: .pytest_cache 2025-12-04T09:59:13.0909630Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.0910356Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.0910673Z configfile: pytest.ini 2025-12-04T09:59:13.0911382Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.0912216Z collecting ... collected 60 items / 1 deselected / 59 selected 2025-12-04T09:59:13.0912669Z stepcurrent: skipping 1 already run items. 2025-12-04T09:59:13.0913014Z Running 26 items in this shard 2025-12-04T09:59:13.0913217Z 2025-12-04T09:59:13.0914197Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda I1204 09:29:31.034000 42555 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 42607 2025-12-04T09:59:13.0915755Z I1204 09:29:31.035000 42555 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 42608 2025-12-04T09:59:13.0916846Z I1204 09:29:31.035000 42555 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 42609 2025-12-04T09:59:13.0917892Z I1204 09:29:31.036000 42555 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 42610 2025-12-04T09:59:13.0919654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0921488Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0922964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0924523Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0926490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0928539Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0930547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0932550Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0934221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0935546Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0937617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0939619Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0941157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.0942710Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.0944666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.0946666Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.0947425Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.0948673Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.0950422Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0951982Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.0953524Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0954960Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.0956540Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0958123Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0959706Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0961248Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0962871Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0964334Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.0965793Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0967294Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.0969445Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 718209024 and is now 743374848. 2025-12-04T09:59:13.0971565Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0972602Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.0974316Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.0975751Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.0977098Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.0978507Z [rank0]:E1204 09:29:37.892000 42607 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.0979647Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.0980800Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.0982476Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.0984116Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.0985753Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.0987298Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.0988920Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0990503Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0991913Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.0993326Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.0994723Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.0996095Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.0997474Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.0998892Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1000921Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 611254272 and is now 634322944. 2025-12-04T09:59:13.1002827Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1003898Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1005601Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1007031Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1008114Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1009381Z [rank1]:E1204 09:29:37.892000 42608 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.1010390Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1011389Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1012876Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1014330Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1015773Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1017475Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1019009Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1020594Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1022402Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1023992Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1025587Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1027142Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1028694Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1030281Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1032633Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T09:59:13.1034680Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1035707Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1037403Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1038827Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1039940Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1041188Z [rank2]:E1204 09:29:37.892000 42609 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.1042192Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1043180Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1044648Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1046143Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1047599Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1048980Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1050311Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1051711Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1053116Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1054526Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1055934Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1057633Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1059179Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1060774Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1063092Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T09:59:13.1065229Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1066383Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1068292Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1069933Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1071021Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1072258Z [rank3]:E1204 09:29:37.896000 42610 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.1072939Z dist init r=1, world=4 2025-12-04T09:59:13.1073179Z dist init r=0, world=4 2025-12-04T09:59:13.1073412Z dist init r=2, world=4 2025-12-04T09:59:13.1073636Z dist init r=3, world=4 2025-12-04T09:59:13.1074807Z [rank0]:[W1204 09:29:38.912336790 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.1076068Z FAILED [9.1848s] [ 3%] 2025-12-04T09:59:13.1076221Z 2025-12-04T09:59:13.1076358Z =================================== FAILURES =================================== 2025-12-04T09:59:13.1077464Z ___ TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda ____ 2025-12-04T09:59:13.1077954Z Traceback (most recent call last): 2025-12-04T09:59:13.1078649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.1079349Z self._join_processes(fn) 2025-12-04T09:59:13.1080039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.1080794Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.1081570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.1082319Z raise RuntimeError(error) 2025-12-04T09:59:13.1082700Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.1083126Z Traceback (most recent call last): 2025-12-04T09:59:13.1083814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1084506Z getattr(self, test_name)() 2025-12-04T09:59:13.1085156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1085830Z fn() 2025-12-04T09:59:13.1086394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1087047Z method(*args, **kwargs) 2025-12-04T09:59:13.1087667Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1088328Z method(*args, **kwargs) 2025-12-04T09:59:13.1088939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1089626Z with policy(): 2025-12-04T09:59:13.1090230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1090901Z raise RuntimeError(msg) 2025-12-04T09:59:13.1092137Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T09:59:13.1093316Z 2025-12-04T09:59:13.1093504Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1094398Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1095099Z 2025-12-04T09:59:13.1095385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1095738Z 2025-12-04T09:59:13.1095745Z 2025-12-04T09:59:13.1095942Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.1096567Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.1097927Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a202ac92fafcf85d.xml - 2025-12-04T09:59:13.1099030Z =========================== short test summary info ============================ 2025-12-04T09:59:13.1100156Z FAILED [9.1848s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.1101287Z Traceback (most recent call last): 2025-12-04T09:59:13.1102069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1102865Z getattr(self, test_name)() 2025-12-04T09:59:13.1103629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1104386Z fn() 2025-12-04T09:59:13.1105019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1105769Z method(*args, **kwargs) 2025-12-04T09:59:13.1106466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1107211Z method(*args, **kwargs) 2025-12-04T09:59:13.1107915Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1108647Z with policy(): 2025-12-04T09:59:13.1109388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1110063Z raise RuntimeError(msg) 2025-12-04T09:59:13.1111299Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T09:59:13.1112472Z 2025-12-04T09:59:13.1112658Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1113549Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1114264Z 2025-12-04T09:59:13.1114498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1115012Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.1115472Z ======================= 1 failed, 1 deselected in 9.40s ======================== 2025-12-04T09:59:13.1115841Z Got exit code 1 2025-12-04T09:59:13.1116068Z Retrying single test... 2025-12-04T09:59:13.1116774Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bacdfd4e137b31c0.xml 2025-12-04T09:59:13.1117592Z ============================= test session starts ============================== 2025-12-04T09:59:13.1118160Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.1118680Z cachedir: .pytest_cache 2025-12-04T09:59:13.1119290Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.1119962Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.1120262Z configfile: pytest.ini 2025-12-04T09:59:13.1121081Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.1122152Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.1123242Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1124213Z Running 1 items in this shard 2025-12-04T09:59:13.1124422Z 2025-12-04T09:59:13.1125455Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda I1204 09:29:44.923000 42892 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 42944 2025-12-04T09:59:13.1127094Z I1204 09:29:44.924000 42892 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 42945 2025-12-04T09:59:13.1128281Z I1204 09:29:44.925000 42892 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 42946 2025-12-04T09:59:13.1129397Z I1204 09:29:44.926000 42892 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 42947 2025-12-04T09:59:13.1131313Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1132801Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1134734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1136586Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1138295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1139781Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1141727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1143726Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1145304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1146791Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1148847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1150734Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1152126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1153439Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1155165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1156933Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1157589Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1158617Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1160100Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1161587Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1163035Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1164381Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1165707Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1167113Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1168529Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1169938Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1171347Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1172709Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1174112Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1175540Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1177918Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 711917568 and is now 743374848. 2025-12-04T09:59:13.1180055Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1181251Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1183178Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1184787Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1185999Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1187383Z [rank0]:E1204 09:29:51.852000 42944 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.1188553Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1189704Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1191213Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1192658Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1194113Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1195463Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1196796Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1198197Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1199591Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1201005Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1202408Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1203801Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1205177Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1206582Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1208597Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 600768512 and is now 634322944. 2025-12-04T09:59:13.1210530Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1211568Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1213260Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1214684Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1215759Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1217402Z [rank1]:E1204 09:29:51.856000 42945 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.1218535Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1219689Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1221608Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1223253Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1224898Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1226423Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1227912Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1229496Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1231082Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1232659Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1234260Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1235626Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1237009Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1238430Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1240495Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T09:59:13.1242396Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1243423Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1245126Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1246598Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1247689Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1248936Z [rank3]:E1204 09:29:51.856000 42947 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.1249977Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1250975Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1252462Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1253924Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1255373Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1256985Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1258489Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1260071Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1261661Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1263272Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1264875Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1266421Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1267973Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1269601Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1271695Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T09:59:13.1273594Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1274622Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1276321Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1277787Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1278865Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1280138Z [rank2]:E1204 09:29:51.857000 42946 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.1280834Z dist init r=2, world=4 2025-12-04T09:59:13.1281079Z dist init r=3, world=4 2025-12-04T09:59:13.1281312Z dist init r=1, world=4 2025-12-04T09:59:13.1281551Z dist init r=0, world=4 2025-12-04T09:59:13.1282721Z [rank0]:[W1204 09:29:52.879508683 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.1283936Z FAILED [9.4296s] [100%] 2025-12-04T09:59:13.1284098Z 2025-12-04T09:59:13.1284230Z =================================== FAILURES =================================== 2025-12-04T09:59:13.1284933Z ___ TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda ____ 2025-12-04T09:59:13.1285454Z Traceback (most recent call last): 2025-12-04T09:59:13.1286187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.1286928Z self._join_processes(fn) 2025-12-04T09:59:13.1287671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.1288476Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.1289291Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.1290094Z raise RuntimeError(error) 2025-12-04T09:59:13.1290517Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.1290995Z Traceback (most recent call last): 2025-12-04T09:59:13.1291729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1292470Z getattr(self, test_name)() 2025-12-04T09:59:13.1293167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1293876Z fn() 2025-12-04T09:59:13.1294476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1295362Z method(*args, **kwargs) 2025-12-04T09:59:13.1296040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1296862Z method(*args, **kwargs) 2025-12-04T09:59:13.1297786Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1298539Z with policy(): 2025-12-04T09:59:13.1299208Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1299968Z raise RuntimeError(msg) 2025-12-04T09:59:13.1301367Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T09:59:13.1302699Z 2025-12-04T09:59:13.1302921Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1303954Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1304767Z 2025-12-04T09:59:13.1305033Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1305472Z 2025-12-04T09:59:13.1305637Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.1306050Z Traceback (most recent call last): 2025-12-04T09:59:13.1306820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1307616Z getattr(self, test_name)() 2025-12-04T09:59:13.1308361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1309243Z fn() 2025-12-04T09:59:13.1309862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1310708Z method(*args, **kwargs) 2025-12-04T09:59:13.1311382Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1312278Z method(*args, **kwargs) 2025-12-04T09:59:13.1312967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1313685Z with policy(): 2025-12-04T09:59:13.1314343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1315079Z raise RuntimeError(msg) 2025-12-04T09:59:13.1316432Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T09:59:13.1317710Z 2025-12-04T09:59:13.1317929Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1318967Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1319745Z 2025-12-04T09:59:13.1319996Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1320392Z 2025-12-04T09:59:13.1320397Z 2025-12-04T09:59:13.1320611Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.1321677Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.1322887Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bacdfd4e137b31c0.xml - 2025-12-04T09:59:13.1323992Z =========================== short test summary info ============================ 2025-12-04T09:59:13.1325207Z FAILED [9.4296s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.1326280Z Traceback (most recent call last): 2025-12-04T09:59:13.1327065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1327847Z getattr(self, test_name)() 2025-12-04T09:59:13.1328593Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1329360Z fn() 2025-12-04T09:59:13.1329993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1330788Z method(*args, **kwargs) 2025-12-04T09:59:13.1331625Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1332495Z method(*args, **kwargs) 2025-12-04T09:59:13.1333577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1334619Z with policy(): 2025-12-04T09:59:13.1335317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1336173Z raise RuntimeError(msg) 2025-12-04T09:59:13.1337914Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T09:59:13.1339301Z 2025-12-04T09:59:13.1339644Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1340756Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1341695Z 2025-12-04T09:59:13.1342004Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1342479Z 2025-12-04T09:59:13.1342718Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.1343276Z Traceback (most recent call last): 2025-12-04T09:59:13.1344146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1345069Z getattr(self, test_name)() 2025-12-04T09:59:13.1345959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1346857Z fn() 2025-12-04T09:59:13.1347584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1348483Z method(*args, **kwargs) 2025-12-04T09:59:13.1349396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1350213Z method(*args, **kwargs) 2025-12-04T09:59:13.1351012Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1351795Z with policy(): 2025-12-04T09:59:13.1352538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1353286Z raise RuntimeError(msg) 2025-12-04T09:59:13.1354635Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T09:59:13.1355906Z 2025-12-04T09:59:13.1356148Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1357201Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1357948Z 2025-12-04T09:59:13.1358244Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1358885Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.1359440Z ======================= 1 failed, 26 deselected in 9.65s ======================= 2025-12-04T09:59:13.1359898Z Got exit code 1 2025-12-04T09:59:13.1360349Z Retrying single test... 2025-12-04T09:59:13.1361139Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2f84fddbafa0e0f3.xml 2025-12-04T09:59:13.1362051Z ============================= test session starts ============================== 2025-12-04T09:59:13.1362843Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.1363430Z cachedir: .pytest_cache 2025-12-04T09:59:13.1364156Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.1365047Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.1365462Z configfile: pytest.ini 2025-12-04T09:59:13.1366152Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.1367103Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.1368193Z stepcurrent: skipping 1 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1369196Z Running 1 items in this shard 2025-12-04T09:59:13.1369480Z 2025-12-04T09:59:13.1370441Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda I1204 09:29:58.863000 43229 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 43281 2025-12-04T09:59:13.1372050Z I1204 09:29:58.864000 43229 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 43282 2025-12-04T09:59:13.1373181Z I1204 09:29:58.865000 43229 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 43283 2025-12-04T09:59:13.1374308Z I1204 09:29:58.866000 43229 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 43284 2025-12-04T09:59:13.1376099Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1377912Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1380082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1382240Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1383940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1385573Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1387123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1388863Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1390857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1392731Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1394267Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1395689Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1397518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1399498Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1401382Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1403289Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1404089Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1405420Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1407139Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1409048Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1410753Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1412397Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1413958Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1415633Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1417610Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1419325Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1421341Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1423049Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1424790Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1426493Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1428971Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 0. CUDA driver allocated memory was 707723264 and is now 743374848. 2025-12-04T09:59:13.1431305Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1432678Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1434704Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1436254Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1437473Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1438862Z [rank0]:E1204 09:30:05.740000 43281 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.1439939Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1441060Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1442734Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1444308Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1445916Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1447375Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1448827Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1450415Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1451934Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1453481Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1455030Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1456611Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1458470Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1460309Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1462770Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 2. CUDA driver allocated memory was 607059968 and is now 634322944. 2025-12-04T09:59:13.1465082Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1466341Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1468386Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1470229Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1471452Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1472767Z [rank2]:E1204 09:30:05.742000 43283 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.1473891Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1475021Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1476620Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1478277Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1479838Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1481312Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1482787Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1484322Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1485853Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1487384Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1488910Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1490370Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1491913Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1493524Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1495687Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 3. CUDA driver allocated memory was 609157120 and is now 634322944. 2025-12-04T09:59:13.1498113Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1499398Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1501482Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1503258Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1504620Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1506115Z [rank3]:E1204 09:30:05.742000 43284 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.1507389Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1508757Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1510527Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1512203Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1513731Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1515235Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1516685Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1518221Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1519823Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1521672Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1523463Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1525201Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1526955Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1528687Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1531177Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T09:59:13.1533445Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1534722Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1536665Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1538585Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1539903Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1541447Z [rank1]:E1204 09:30:05.744000 43282 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.1542364Z dist init r=0, world=4 2025-12-04T09:59:13.1542811Z dist init r=1, world=4 2025-12-04T09:59:13.1543154Z dist init r=3, world=4 2025-12-04T09:59:13.1543544Z dist init r=2, world=4 2025-12-04T09:59:13.1545160Z [rank0]:[W1204 09:30:06.755426709 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.1546680Z FAILED [9.2234s] [100%] 2025-12-04T09:59:13.1546897Z 2025-12-04T09:59:13.1547101Z =================================== FAILURES =================================== 2025-12-04T09:59:13.1547848Z ___ TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda ____ 2025-12-04T09:59:13.1548519Z Traceback (most recent call last): 2025-12-04T09:59:13.1549558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.1550421Z self._join_processes(fn) 2025-12-04T09:59:13.1551241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.1552149Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.1553050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.1553939Z raise RuntimeError(error) 2025-12-04T09:59:13.1554426Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.1555066Z Traceback (most recent call last): 2025-12-04T09:59:13.1555836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1556633Z getattr(self, test_name)() 2025-12-04T09:59:13.1557453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1558288Z fn() 2025-12-04T09:59:13.1558902Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1559739Z method(*args, **kwargs) 2025-12-04T09:59:13.1560479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1561223Z method(*args, **kwargs) 2025-12-04T09:59:13.1562014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1562187Z with policy(): 2025-12-04T09:59:13.1562680Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1562867Z raise RuntimeError(msg) 2025-12-04T09:59:13.1564020Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T09:59:13.1564029Z 2025-12-04T09:59:13.1564288Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1564969Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1564977Z 2025-12-04T09:59:13.1565252Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1565257Z 2025-12-04T09:59:13.1565262Z 2025-12-04T09:59:13.1565541Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.1565791Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.1618870Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2f84fddbafa0e0f3.xml - 2025-12-04T09:59:13.1619172Z =========================== short test summary info ============================ 2025-12-04T09:59:13.1620136Z FAILED [9.2234s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.1620269Z Traceback (most recent call last): 2025-12-04T09:59:13.1621066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1621189Z getattr(self, test_name)() 2025-12-04T09:59:13.1621751Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1621844Z fn() 2025-12-04T09:59:13.1622357Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1622481Z method(*args, **kwargs) 2025-12-04T09:59:13.1622988Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1623194Z method(*args, **kwargs) 2025-12-04T09:59:13.1623713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1623815Z with policy(): 2025-12-04T09:59:13.1624334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1624444Z raise RuntimeError(msg) 2025-12-04T09:59:13.1625642Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda! Caching allocator allocated memory was 512 and is now reported as 19456 on device 1. CUDA driver allocated memory was 604962816 and is now 634322944. 2025-12-04T09:59:13.1625703Z 2025-12-04T09:59:13.1625920Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1626586Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1626595Z 2025-12-04T09:59:13.1626915Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1627099Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.1627281Z ======================= 1 failed, 26 deselected in 9.44s ======================= 2025-12-04T09:59:13.1627376Z Got exit code 1 2025-12-04T09:59:13.1627969Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda 2025-12-04T09:59:13.1628380Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.1628997Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8511307d41418b77.xml 2025-12-04T09:59:13.1629162Z ============================= test session starts ============================== 2025-12-04T09:59:13.1629530Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.1629642Z cachedir: .pytest_cache 2025-12-04T09:59:13.1630164Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.1630289Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.1630396Z configfile: pytest.ini 2025-12-04T09:59:13.1630939Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.1631154Z collecting ... collected 60 items / 2 deselected / 58 selected 2025-12-04T09:59:13.1631294Z stepcurrent: skipping 2 already run items. 2025-12-04T09:59:13.1631416Z Running 25 items in this shard 2025-12-04T09:59:13.1631422Z 2025-12-04T09:59:13.1632632Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda I1204 09:30:12.774000 43566 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 43618 2025-12-04T09:59:13.1633219Z I1204 09:30:12.774000 43566 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 43619 2025-12-04T09:59:13.1633661Z I1204 09:30:12.775000 43566 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 43620 2025-12-04T09:59:13.1634104Z I1204 09:30:12.776000 43566 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 43621 2025-12-04T09:59:13.1635207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1635324Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1636247Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.1636408Z {} 2025-12-04T09:59:13.1636704Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.1636894Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.1638412Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1638605Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1639697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1639853Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1640958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1641078Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1641959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.1642118Z {} 2025-12-04T09:59:13.1642411Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.1642605Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.1644140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1644288Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1645167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.1645331Z {} 2025-12-04T09:59:13.1645661Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.1645865Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.1647380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1647534Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1648655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1648773Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1649674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.1649829Z {} 2025-12-04T09:59:13.1650120Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.1650309Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.1651827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1652035Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1652445Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1652933Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1653822Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1654274Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1655165Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1655519Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1656468Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1657107Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1658085Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1658612Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1659572Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1660035Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1661003Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1661509Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1663269Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 718209024 and is now 732889088. 2025-12-04T09:59:13.1663655Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1664315Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1665520Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1665917Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1666635Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1667223Z [rank0]:E1204 09:30:19.585000 43618 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.1667671Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1668211Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1669291Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1669748Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1670629Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1670982Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1671843Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1672279Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1673167Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1673599Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1674452Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1674861Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1675715Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1676194Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1677719Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 611254272 and is now 623837184. 2025-12-04T09:59:13.1678049Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1678633Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1679726Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1680081Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1680719Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1681212Z [rank1]:E1204 09:30:19.585000 43619 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.1681611Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1682096Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1682990Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1683444Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1684331Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1684682Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1685552Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1686005Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1687068Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1687525Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1688423Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1688858Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1689797Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1690271Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1691887Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T09:59:13.1692268Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1692893Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1694059Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1694400Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1695071Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1695590Z [rank2]:E1204 09:30:19.585000 43620 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.1696017Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1696608Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1697781Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1698290Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1699294Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1699696Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1700704Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1701193Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1702161Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1702648Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1703638Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1704097Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1705058Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1705560Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1707280Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T09:59:13.1707686Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1708370Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1709714Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1710040Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1710676Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1711170Z [rank3]:E1204 09:30:19.588000 43621 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.1711265Z dist init r=1, world=4 2025-12-04T09:59:13.1711362Z dist init r=2, world=4 2025-12-04T09:59:13.1711448Z dist init r=3, world=4 2025-12-04T09:59:13.1711535Z dist init r=0, world=4 2025-12-04T09:59:13.1712573Z [rank0]:[W1204 09:30:19.609914518 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.1712664Z FAILED [8.5769s] [ 4%] 2025-12-04T09:59:13.1712669Z 2025-12-04T09:59:13.1712809Z =================================== FAILURES =================================== 2025-12-04T09:59:13.1713134Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda _ 2025-12-04T09:59:13.1713245Z Traceback (most recent call last): 2025-12-04T09:59:13.1713774Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.1713877Z self._join_processes(fn) 2025-12-04T09:59:13.1714400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.1714533Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.1715075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.1715191Z raise RuntimeError(error) 2025-12-04T09:59:13.1715400Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.1715511Z Traceback (most recent call last): 2025-12-04T09:59:13.1715999Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1716123Z getattr(self, test_name)() 2025-12-04T09:59:13.1716600Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1716691Z fn() 2025-12-04T09:59:13.1717142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1717246Z method(*args, **kwargs) 2025-12-04T09:59:13.1717887Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1717991Z method(*args, **kwargs) 2025-12-04T09:59:13.1718471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1718597Z with policy(): 2025-12-04T09:59:13.1719078Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1719187Z raise RuntimeError(msg) 2025-12-04T09:59:13.1720377Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T09:59:13.1720412Z 2025-12-04T09:59:13.1720624Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1721697Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1721705Z 2025-12-04T09:59:13.1721985Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1721996Z 2025-12-04T09:59:13.1722000Z 2025-12-04T09:59:13.1722295Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.1722565Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.1723365Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8511307d41418b77.xml - 2025-12-04T09:59:13.1723542Z =========================== short test summary info ============================ 2025-12-04T09:59:13.1724446Z FAILED [8.5769s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.1724573Z Traceback (most recent call last): 2025-12-04T09:59:13.1725120Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1725237Z getattr(self, test_name)() 2025-12-04T09:59:13.1725772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1725860Z fn() 2025-12-04T09:59:13.1726436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1726543Z method(*args, **kwargs) 2025-12-04T09:59:13.1727045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1727155Z method(*args, **kwargs) 2025-12-04T09:59:13.1727659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1727759Z with policy(): 2025-12-04T09:59:13.1728269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1728377Z raise RuntimeError(msg) 2025-12-04T09:59:13.1729693Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T09:59:13.1729703Z 2025-12-04T09:59:13.1729915Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1730664Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1730669Z 2025-12-04T09:59:13.1730933Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1731106Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.1731324Z ======================= 1 failed, 2 deselected in 8.80s ======================== 2025-12-04T09:59:13.1731420Z Got exit code 1 2025-12-04T09:59:13.1731530Z Retrying single test... 2025-12-04T09:59:13.1732149Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3768a5b2a44119fc.xml 2025-12-04T09:59:13.1732346Z ============================= test session starts ============================== 2025-12-04T09:59:13.1732703Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.1732810Z cachedir: .pytest_cache 2025-12-04T09:59:13.1733331Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.1733452Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.1733554Z configfile: pytest.ini 2025-12-04T09:59:13.1734197Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.1734409Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.1735209Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1735329Z Running 1 items in this shard 2025-12-04T09:59:13.1735334Z 2025-12-04T09:59:13.1736475Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda I1204 09:30:26.204000 43887 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 43939 2025-12-04T09:59:13.1737152Z I1204 09:30:26.204000 43887 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 43940 2025-12-04T09:59:13.1737647Z I1204 09:30:26.205000 43887 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 43941 2025-12-04T09:59:13.1738149Z I1204 09:30:26.206000 43887 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 43942 2025-12-04T09:59:13.1739433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1739561Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1740569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.1740740Z {} 2025-12-04T09:59:13.1741067Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.1741285Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.1743037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1743216Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1744442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1744580Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1746130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.1746304Z {} 2025-12-04T09:59:13.1746634Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.1746877Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.1748814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1749084Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1750201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1750314Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1751195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.1751353Z {} 2025-12-04T09:59:13.1751633Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.1751827Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.1753347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1753524Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1754630Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1754738Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1755636Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.1755787Z {} 2025-12-04T09:59:13.1756076Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.1756287Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.1757813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1757969Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1758375Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1758852Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1759770Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1760249Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1761145Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1761500Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1762357Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1762795Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1763656Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1764095Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1764946Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1765347Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1766231Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1766678Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1768206Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 720306176 and is now 732889088. 2025-12-04T09:59:13.1768537Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1769153Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1770218Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1770542Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1771177Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1771667Z [rank0]:E1204 09:30:32.998000 43939 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.1772091Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1772577Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1773485Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1773933Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1774817Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1775172Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1776035Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1776540Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1777671Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1778156Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1779114Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1779608Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1780576Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1781080Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1782796Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 609157120 and is now 623837184. 2025-12-04T09:59:13.1783200Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1783867Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1785063Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1785425Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1786135Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1786712Z [rank1]:E1204 09:30:32.998000 43940 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.1787162Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1787724Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1788833Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1789407Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1790296Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1790649Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1791506Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1791936Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1792791Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1793220Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1794108Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1794512Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1795361Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1795801Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1797364Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 607059968 and is now 623837184. 2025-12-04T09:59:13.1797696Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1798280Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1799340Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1799687Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1800325Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1800838Z [rank2]:E1204 09:30:33.000000 43941 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.1801237Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1801713Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1802594Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1803044Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1803929Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1804281Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1805138Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1805567Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1806432Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1806883Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1807732Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1808131Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1808983Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1809426Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1810971Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T09:59:13.1811307Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1811890Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1812952Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1813309Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1813966Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1814454Z [rank3]:E1204 09:30:33.002000 43942 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.1814541Z dist init r=0, world=4 2025-12-04T09:59:13.1814626Z dist init r=3, world=4 2025-12-04T09:59:13.1814719Z dist init r=2, world=4 2025-12-04T09:59:13.1814805Z dist init r=1, world=4 2025-12-04T09:59:13.1815841Z [rank0]:[W1204 09:30:33.010469243 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.1815930Z FAILED [8.5029s] [100%] 2025-12-04T09:59:13.1815938Z 2025-12-04T09:59:13.1816067Z =================================== FAILURES =================================== 2025-12-04T09:59:13.1816463Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda _ 2025-12-04T09:59:13.1816571Z Traceback (most recent call last): 2025-12-04T09:59:13.1817284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.1817426Z self._join_processes(fn) 2025-12-04T09:59:13.1818008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.1818160Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.1818767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.1818890Z raise RuntimeError(error) 2025-12-04T09:59:13.1819162Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.1819282Z Traceback (most recent call last): 2025-12-04T09:59:13.1819823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1819942Z getattr(self, test_name)() 2025-12-04T09:59:13.1820476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1820563Z fn() 2025-12-04T09:59:13.1821321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1821439Z method(*args, **kwargs) 2025-12-04T09:59:13.1821950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1822122Z method(*args, **kwargs) 2025-12-04T09:59:13.1822626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1822733Z with policy(): 2025-12-04T09:59:13.1823237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1823344Z raise RuntimeError(msg) 2025-12-04T09:59:13.1824625Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 720306176 and is now 732889088. 2025-12-04T09:59:13.1824673Z 2025-12-04T09:59:13.1824888Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1825647Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1825655Z 2025-12-04T09:59:13.1825956Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1825963Z 2025-12-04T09:59:13.1825967Z 2025-12-04T09:59:13.1826193Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.1826458Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.1827253Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3768a5b2a44119fc.xml - 2025-12-04T09:59:13.1827428Z =========================== short test summary info ============================ 2025-12-04T09:59:13.1828332Z FAILED [8.5029s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.1828459Z Traceback (most recent call last): 2025-12-04T09:59:13.1829009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1829124Z getattr(self, test_name)() 2025-12-04T09:59:13.1829668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1829754Z fn() 2025-12-04T09:59:13.1830270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1830373Z method(*args, **kwargs) 2025-12-04T09:59:13.1830881Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1830993Z method(*args, **kwargs) 2025-12-04T09:59:13.1831499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1831592Z with policy(): 2025-12-04T09:59:13.1832151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1832268Z raise RuntimeError(msg) 2025-12-04T09:59:13.1833619Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 720306176 and is now 732889088. 2025-12-04T09:59:13.1833625Z 2025-12-04T09:59:13.1833826Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1834522Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1834537Z 2025-12-04T09:59:13.1834820Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1834991Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.1835161Z ======================= 1 failed, 26 deselected in 8.72s ======================= 2025-12-04T09:59:13.1835252Z Got exit code 1 2025-12-04T09:59:13.1835349Z Retrying single test... 2025-12-04T09:59:13.1835943Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-31ee953fde08a139.xml 2025-12-04T09:59:13.1836092Z ============================= test session starts ============================== 2025-12-04T09:59:13.1836593Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.1836727Z cachedir: .pytest_cache 2025-12-04T09:59:13.1837220Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.1837345Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.1837446Z configfile: pytest.ini 2025-12-04T09:59:13.1837960Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.1838255Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.1839048Z stepcurrent: skipping 2 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1839163Z Running 1 items in this shard 2025-12-04T09:59:13.1839168Z 2025-12-04T09:59:13.1840228Z distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda I1204 09:30:39.564000 44208 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 44260 2025-12-04T09:59:13.1840718Z I1204 09:30:39.565000 44208 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 44261 2025-12-04T09:59:13.1841197Z I1204 09:30:39.565000 44208 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 44262 2025-12-04T09:59:13.1841772Z I1204 09:30:39.566000 44208 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 44263 2025-12-04T09:59:13.1842959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1843075Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1844243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1844403Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1845339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.1845510Z {} 2025-12-04T09:59:13.1845809Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.1846016Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.1846949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.1847112Z {} 2025-12-04T09:59:13.1847440Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.1847641Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.1849260Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1849415Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1851018Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1851199Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1852396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1852522Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1853453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.1853622Z {} 2025-12-04T09:59:13.1853922Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.1854120Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.1855742Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1855897Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1857339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.1857473Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.1858514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.1858687Z {} 2025-12-04T09:59:13.1859004Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.1859227Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.1860931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.1861108Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.1861595Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1862139Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1863149Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1863661Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1864652Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1865078Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1866051Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1866566Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1867528Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1868020Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1869183Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1869588Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1870441Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1870886Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1872418Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 716111872 and is now 732889088. 2025-12-04T09:59:13.1872769Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1873366Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1874434Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1874760Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1875428Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1875922Z [rank0]:E1204 09:30:46.268000 44260 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.1876322Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1876796Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1877692Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1878167Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1879056Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1879431Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1880292Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1880717Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1881568Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1882009Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1882858Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1883265Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1884121Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1884561Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1886107Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 2. CUDA driver allocated memory was 609157120 and is now 623837184. 2025-12-04T09:59:13.1886435Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1887024Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1888082Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1888438Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1889074Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1889567Z [rank2]:E1204 09:30:46.271000 44262 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.1889961Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1890429Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1891323Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1891796Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1892713Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1893059Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1893916Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1894346Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1895198Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1895642Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1896566Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1897179Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1898144Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1898682Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1900399Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 1. CUDA driver allocated memory was 609157120 and is now 623837184. 2025-12-04T09:59:13.1900764Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1901432Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1902661Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1903031Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1903747Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1904297Z [rank1]:E1204 09:30:46.271000 44261 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.1904747Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1905305Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1906315Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1906846Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1907838Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1908234Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1909305Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1909766Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1910663Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1911129Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1912025Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1912455Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1913384Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1913856Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1915468Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 3. CUDA driver allocated memory was 604962816 and is now 623837184. 2025-12-04T09:59:13.1915808Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1916462Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1917772Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1918127Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1918817Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1919382Z [rank3]:E1204 09:30:46.273000 44263 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.1919479Z dist init r=1, world=4 2025-12-04T09:59:13.1919575Z dist init r=2, world=4 2025-12-04T09:59:13.1919674Z dist init r=0, world=4 2025-12-04T09:59:13.1919850Z dist init r=3, world=4 2025-12-04T09:59:13.1921493Z [rank0]:[W1204 09:30:46.287123994 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.1921614Z FAILED [8.6814s] [100%] 2025-12-04T09:59:13.1921620Z 2025-12-04T09:59:13.1921767Z =================================== FAILURES =================================== 2025-12-04T09:59:13.1922129Z _ TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda _ 2025-12-04T09:59:13.1922247Z Traceback (most recent call last): 2025-12-04T09:59:13.1922800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.1922925Z self._join_processes(fn) 2025-12-04T09:59:13.1923518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.1923667Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.1924272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.1924383Z raise RuntimeError(error) 2025-12-04T09:59:13.1924623Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.1924743Z Traceback (most recent call last): 2025-12-04T09:59:13.1925281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1925401Z getattr(self, test_name)() 2025-12-04T09:59:13.1925933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1926030Z fn() 2025-12-04T09:59:13.1926612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1926723Z method(*args, **kwargs) 2025-12-04T09:59:13.1927241Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1927346Z method(*args, **kwargs) 2025-12-04T09:59:13.1927852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1927955Z with policy(): 2025-12-04T09:59:13.1928469Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1928591Z raise RuntimeError(msg) 2025-12-04T09:59:13.1929901Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 716111872 and is now 732889088. 2025-12-04T09:59:13.1929911Z 2025-12-04T09:59:13.1930125Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1930870Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1930876Z 2025-12-04T09:59:13.1931145Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1931150Z 2025-12-04T09:59:13.1931155Z 2025-12-04T09:59:13.1931385Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.1931688Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.1932499Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-31ee953fde08a139.xml - 2025-12-04T09:59:13.1932672Z =========================== short test summary info ============================ 2025-12-04T09:59:13.1933740Z FAILED [8.6814s] distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.1933868Z Traceback (most recent call last): 2025-12-04T09:59:13.1934518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1934636Z getattr(self, test_name)() 2025-12-04T09:59:13.1935139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1935226Z fn() 2025-12-04T09:59:13.1935713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1935814Z method(*args, **kwargs) 2025-12-04T09:59:13.1936352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1936479Z method(*args, **kwargs) 2025-12-04T09:59:13.1937141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1937246Z with policy(): 2025-12-04T09:59:13.1937749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1937856Z raise RuntimeError(msg) 2025-12-04T09:59:13.1939134Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 13312 on device 0. CUDA driver allocated memory was 716111872 and is now 732889088. 2025-12-04T09:59:13.1939144Z 2025-12-04T09:59:13.1939394Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1940152Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestHooksCUDA.test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1940158Z 2025-12-04T09:59:13.1940420Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1940596Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.1940779Z ======================= 1 failed, 26 deselected in 8.90s ======================= 2025-12-04T09:59:13.1940874Z Got exit code 1 2025-12-04T09:59:13.1941544Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda 2025-12-04T09:59:13.1941980Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.1942602Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cf0a0887fe85c292.xml 2025-12-04T09:59:13.1942769Z ============================= test session starts ============================== 2025-12-04T09:59:13.1943119Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.1943234Z cachedir: .pytest_cache 2025-12-04T09:59:13.1943745Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.1943863Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.1943973Z configfile: pytest.ini 2025-12-04T09:59:13.1944505Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.1944761Z collecting ... collected 60 items / 3 deselected / 57 selected 2025-12-04T09:59:13.1944909Z stepcurrent: skipping 3 already run items. 2025-12-04T09:59:13.1945020Z Running 24 items in this shard 2025-12-04T09:59:13.1945054Z 2025-12-04T09:59:13.1946102Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda I1204 09:30:53.004000 44529 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 44581 2025-12-04T09:59:13.1946596Z I1204 09:30:53.005000 44529 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 44582 2025-12-04T09:59:13.1947086Z I1204 09:30:53.005000 44529 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 44583 2025-12-04T09:59:13.1947586Z I1204 09:30:53.006000 44529 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 44584 2025-12-04T09:59:13.1949758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.1949860Z _warn_cpu_init() 2025-12-04T09:59:13.1951652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.1951748Z _warn_cpu_init() 2025-12-04T09:59:13.1953553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.1953650Z _warn_cpu_init() 2025-12-04T09:59:13.1955435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.1955555Z _warn_cpu_init() 2025-12-04T09:59:13.1956439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.1956539Z return func(*args, **kwargs) 2025-12-04T09:59:13.1956953Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1957423Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1958317Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1958793Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1959666Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1960056Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1960909Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1961342Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1962193Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1962629Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1963477Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1963877Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1964738Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1965179Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1966685Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912. 2025-12-04T09:59:13.1967014Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1967603Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1968624Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.1968947Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1969586Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1970069Z [rank0]:E1204 09:31:21.247000 44581 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.1970478Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1970944Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1971863Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1972316Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1973213Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1973572Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1974421Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1974865Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1975712Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1976155Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1977307Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1977751Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1978730Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1979254Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1981007Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T09:59:13.1981374Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1982045Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1983202Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.1983568Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1984296Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1984841Z [rank1]:E1204 09:31:21.247000 44582 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.1985297Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1985858Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.1986869Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.1987404Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.1988393Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.1988797Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.1989782Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1990224Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1991071Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.1991505Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.1992353Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.1992750Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.1993636Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.1994076Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.1995546Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.1995870Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1996497Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.1997499Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.1997823Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.1998463Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.1998968Z [rank3]:E1204 09:31:21.248000 44584 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.1999375Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.1999842Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2000761Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2001207Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2002082Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2002450Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2003303Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2003743Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2004589Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2005031Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2005908Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2006308Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2007173Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2007606Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2009103Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T09:59:13.2009429Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2010020Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2011010Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2011329Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2011996Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2012484Z [rank2]:E1204 09:31:21.248000 44583 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.2012610Z dist init r=1, world=4 2025-12-04T09:59:13.2012696Z dist init r=0, world=4 2025-12-04T09:59:13.2012781Z dist init r=3, world=4 2025-12-04T09:59:13.2012872Z dist init r=2, world=4 2025-12-04T09:59:13.2013900Z [rank0]:[W1204 09:31:21.263361127 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.2014000Z FAILED [29.8483s] [ 4%] 2025-12-04T09:59:13.2014005Z 2025-12-04T09:59:13.2014134Z =================================== FAILURES =================================== 2025-12-04T09:59:13.2014410Z ____ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda _____ 2025-12-04T09:59:13.2014524Z Traceback (most recent call last): 2025-12-04T09:59:13.2015011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.2015111Z self._join_processes(fn) 2025-12-04T09:59:13.2015636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.2015759Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.2016361Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.2016473Z raise RuntimeError(error) 2025-12-04T09:59:13.2016683Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.2016809Z Traceback (most recent call last): 2025-12-04T09:59:13.2017518Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2017632Z getattr(self, test_name)() 2025-12-04T09:59:13.2018215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2018307Z fn() 2025-12-04T09:59:13.2018825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2018928Z method(*args, **kwargs) 2025-12-04T09:59:13.2019431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2019537Z method(*args, **kwargs) 2025-12-04T09:59:13.2020040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2020146Z with policy(): 2025-12-04T09:59:13.2020651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2021010Z raise RuntimeError(msg) 2025-12-04T09:59:13.2022234Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.2022243Z 2025-12-04T09:59:13.2022461Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2023149Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2023156Z 2025-12-04T09:59:13.2023421Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2023467Z 2025-12-04T09:59:13.2023471Z 2025-12-04T09:59:13.2023691Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.2023967Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.2024770Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cf0a0887fe85c292.xml - 2025-12-04T09:59:13.2024983Z =========================== short test summary info ============================ 2025-12-04T09:59:13.2025821Z FAILED [29.8483s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.2025942Z Traceback (most recent call last): 2025-12-04T09:59:13.2026499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2026614Z getattr(self, test_name)() 2025-12-04T09:59:13.2027160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2027250Z fn() 2025-12-04T09:59:13.2027759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2027875Z method(*args, **kwargs) 2025-12-04T09:59:13.2028383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2028486Z method(*args, **kwargs) 2025-12-04T09:59:13.2028996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2029095Z with policy(): 2025-12-04T09:59:13.2029606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2029716Z raise RuntimeError(msg) 2025-12-04T09:59:13.2030958Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.2030972Z 2025-12-04T09:59:13.2031185Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2031863Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2031869Z 2025-12-04T09:59:13.2032136Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2032316Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.2032602Z ======================= 1 failed, 3 deselected in 30.07s ======================= 2025-12-04T09:59:13.2032711Z Got exit code 1 2025-12-04T09:59:13.2032924Z Retrying single test... 2025-12-04T09:59:13.2033553Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-07c27c95d6f3d3d6.xml 2025-12-04T09:59:13.2033706Z ============================= test session starts ============================== 2025-12-04T09:59:13.2034037Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.2034142Z cachedir: .pytest_cache 2025-12-04T09:59:13.2034619Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.2034731Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.2034834Z configfile: pytest.ini 2025-12-04T09:59:13.2035339Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.2035598Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.2036311Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2036418Z Running 1 items in this shard 2025-12-04T09:59:13.2036451Z 2025-12-04T09:59:13.2037418Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda I1204 09:31:27.594000 44866 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 44918 2025-12-04T09:59:13.2037884Z I1204 09:31:27.595000 44866 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 44919 2025-12-04T09:59:13.2038353Z I1204 09:31:27.596000 44866 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 44920 2025-12-04T09:59:13.2038809Z I1204 09:31:27.596000 44866 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 44921 2025-12-04T09:59:13.2040767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2040857Z _warn_cpu_init() 2025-12-04T09:59:13.2042645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2042745Z _warn_cpu_init() 2025-12-04T09:59:13.2044564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2044661Z _warn_cpu_init() 2025-12-04T09:59:13.2046447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2046567Z _warn_cpu_init() 2025-12-04T09:59:13.2047456Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.2047566Z return func(*args, **kwargs) 2025-12-04T09:59:13.2047972Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2048446Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2049345Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2049884Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2050774Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2051153Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2052014Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2052441Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2053296Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2053736Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2054582Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2054983Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2055837Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2056340Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2058163Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 720306176 and is now 758054912. 2025-12-04T09:59:13.2058531Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2059201Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2060357Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2060730Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2061446Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2061993Z [rank0]:E1204 09:31:54.279000 44918 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.2062441Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2062969Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2064014Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2064549Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2065543Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2065934Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2066908Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2067396Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2068364Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2068964Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2069940Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2070344Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2071227Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2071673Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2073140Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.2073461Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2074055Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2075079Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2075412Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2076049Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2076534Z [rank1]:E1204 09:31:54.282000 44919 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.2076958Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2077431Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2078324Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2078793Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2079679Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2080026Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2080880Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2081319Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2082168Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2082604Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2083451Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2083858Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2084733Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2085173Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2086639Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T09:59:13.2086965Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2088028Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2089033Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2089367Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2090000Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2090518Z [rank2]:E1204 09:31:54.282000 44920 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.2090924Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2091443Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2092342Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2092792Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2093674Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2094026Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2094880Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2095318Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2096171Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2096852Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2097859Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2098311Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2099272Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2099761Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2101463Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T09:59:13.2101834Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2102502Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2103624Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2103995Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2104741Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2105285Z [rank3]:E1204 09:31:54.283000 44921 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.2105419Z dist init r=1, world=4 2025-12-04T09:59:13.2105519Z dist init r=0, world=4 2025-12-04T09:59:13.2105619Z dist init r=2, world=4 2025-12-04T09:59:13.2105715Z dist init r=3, world=4 2025-12-04T09:59:13.2106863Z [rank0]:[W1204 09:31:54.297324510 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.2106971Z FAILED [28.4302s] [100%] 2025-12-04T09:59:13.2106977Z 2025-12-04T09:59:13.2107127Z =================================== FAILURES =================================== 2025-12-04T09:59:13.2107442Z ____ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda _____ 2025-12-04T09:59:13.2107562Z Traceback (most recent call last): 2025-12-04T09:59:13.2108114Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.2108235Z self._join_processes(fn) 2025-12-04T09:59:13.2108816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.2109061Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.2109611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.2109714Z raise RuntimeError(error) 2025-12-04T09:59:13.2109929Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.2110037Z Traceback (most recent call last): 2025-12-04T09:59:13.2110520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2110625Z getattr(self, test_name)() 2025-12-04T09:59:13.2111128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2111208Z fn() 2025-12-04T09:59:13.2111671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2111764Z method(*args, **kwargs) 2025-12-04T09:59:13.2112222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2112312Z method(*args, **kwargs) 2025-12-04T09:59:13.2112757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2112852Z with policy(): 2025-12-04T09:59:13.2113323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2113430Z raise RuntimeError(msg) 2025-12-04T09:59:13.2114494Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 720306176 and is now 758054912. 2025-12-04T09:59:13.2114502Z 2025-12-04T09:59:13.2114692Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2115298Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2115303Z 2025-12-04T09:59:13.2115539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2115571Z 2025-12-04T09:59:13.2115576Z 2025-12-04T09:59:13.2115782Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.2116016Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.2116729Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-07c27c95d6f3d3d6.xml - 2025-12-04T09:59:13.2116915Z =========================== short test summary info ============================ 2025-12-04T09:59:13.2117653Z FAILED [28.4302s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.2117772Z Traceback (most recent call last): 2025-12-04T09:59:13.2118260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2118361Z getattr(self, test_name)() 2025-12-04T09:59:13.2118848Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2118928Z fn() 2025-12-04T09:59:13.2119393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2119490Z method(*args, **kwargs) 2025-12-04T09:59:13.2119936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2120041Z method(*args, **kwargs) 2025-12-04T09:59:13.2120484Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2120569Z with policy(): 2025-12-04T09:59:13.2121429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2121544Z raise RuntimeError(msg) 2025-12-04T09:59:13.2122829Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 720306176 and is now 758054912. 2025-12-04T09:59:13.2122838Z 2025-12-04T09:59:13.2123057Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2123733Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2123747Z 2025-12-04T09:59:13.2124015Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2124193Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.2124377Z ====================== 1 failed, 26 deselected in 28.65s ======================= 2025-12-04T09:59:13.2124476Z Got exit code 1 2025-12-04T09:59:13.2124583Z Retrying single test... 2025-12-04T09:59:13.2125251Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ec3b2535e8e2ad7.xml 2025-12-04T09:59:13.2125415Z ============================= test session starts ============================== 2025-12-04T09:59:13.2125780Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.2125888Z cachedir: .pytest_cache 2025-12-04T09:59:13.2126407Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.2126535Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.2126642Z configfile: pytest.ini 2025-12-04T09:59:13.2127179Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.2127434Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.2128188Z stepcurrent: skipping 3 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2128346Z Running 1 items in this shard 2025-12-04T09:59:13.2128352Z 2025-12-04T09:59:13.2129382Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda I1204 09:32:00.694000 45203 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 45255 2025-12-04T09:59:13.2129888Z I1204 09:32:00.695000 45203 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 45256 2025-12-04T09:59:13.2130381Z I1204 09:32:00.696000 45203 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 45257 2025-12-04T09:59:13.2130872Z I1204 09:32:00.697000 45203 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 45258 2025-12-04T09:59:13.2132917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2133017Z _warn_cpu_init() 2025-12-04T09:59:13.2134965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2135055Z _warn_cpu_init() 2025-12-04T09:59:13.2137159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2137261Z _warn_cpu_init() 2025-12-04T09:59:13.2139306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2139402Z _warn_cpu_init() 2025-12-04T09:59:13.2140402Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.2140529Z return func(*args, **kwargs) 2025-12-04T09:59:13.2140991Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2141536Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2142536Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2143089Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2144090Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2144511Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2145479Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2145965Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2146941Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2147428Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2148381Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2148920Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2149773Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2150221Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2151719Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 720306176 and is now 758054912. 2025-12-04T09:59:13.2152053Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2152633Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2153662Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2153987Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2154624Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2155117Z [rank0]:E1204 09:32:29.250000 45255 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.2155516Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2155989Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2156910Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2157387Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2158266Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2158619Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2159480Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2159920Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2160774Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2161207Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2162056Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2162458Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2163336Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2163783Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2165241Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T09:59:13.2165569Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2166184Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2167191Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2167511Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2168149Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2168635Z [rank1]:E1204 09:32:29.251000 45256 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.2169063Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2169541Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2170433Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2170902Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2171786Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2172139Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2173000Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2173430Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2174281Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2174707Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2175553Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2175961Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2177116Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2177626Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2179285Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.2179689Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2180349Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2181480Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2181843Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2182558Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2183146Z [rank2]:E1204 09:32:29.251000 45257 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.2183599Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2184167Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2185171Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2185676Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2186665Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2187068Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2188040Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2188533Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2189512Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2189944Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2190819Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2191219Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2192075Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2192515Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2194030Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T09:59:13.2194360Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2194941Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2195935Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2196262Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2196919Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2197406Z [rank3]:E1204 09:32:29.252000 45258 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.2197519Z dist init r=1, world=4 2025-12-04T09:59:13.2197606Z dist init r=2, world=4 2025-12-04T09:59:13.2197699Z dist init r=0, world=4 2025-12-04T09:59:13.2197782Z dist init r=3, world=4 2025-12-04T09:59:13.2198817Z [rank0]:[W1204 09:32:29.277226647 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.2198909Z FAILED [30.5436s] [100%] 2025-12-04T09:59:13.2198914Z 2025-12-04T09:59:13.2199044Z =================================== FAILURES =================================== 2025-12-04T09:59:13.2199321Z ____ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda _____ 2025-12-04T09:59:13.2199428Z Traceback (most recent call last): 2025-12-04T09:59:13.2199923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.2200024Z self._join_processes(fn) 2025-12-04T09:59:13.2200547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.2200680Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.2201217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.2201314Z raise RuntimeError(error) 2025-12-04T09:59:13.2201523Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.2201628Z Traceback (most recent call last): 2025-12-04T09:59:13.2202115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2202211Z getattr(self, test_name)() 2025-12-04T09:59:13.2202708Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2202798Z fn() 2025-12-04T09:59:13.2203252Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2203346Z method(*args, **kwargs) 2025-12-04T09:59:13.2203797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2203891Z method(*args, **kwargs) 2025-12-04T09:59:13.2204349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2204439Z with policy(): 2025-12-04T09:59:13.2204922Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2205025Z raise RuntimeError(msg) 2025-12-04T09:59:13.2206092Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 720306176 and is now 758054912. 2025-12-04T09:59:13.2206097Z 2025-12-04T09:59:13.2206293Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2206891Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2206896Z 2025-12-04T09:59:13.2207129Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2207168Z 2025-12-04T09:59:13.2207172Z 2025-12-04T09:59:13.2207364Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.2207598Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.2208340Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ec3b2535e8e2ad7.xml - 2025-12-04T09:59:13.2208491Z =========================== short test summary info ============================ 2025-12-04T09:59:13.2209233Z FAILED [30.5436s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.2209340Z Traceback (most recent call last): 2025-12-04T09:59:13.2209825Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2209932Z getattr(self, test_name)() 2025-12-04T09:59:13.2210410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2210493Z fn() 2025-12-04T09:59:13.2210952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2211044Z method(*args, **kwargs) 2025-12-04T09:59:13.2211500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2211594Z method(*args, **kwargs) 2025-12-04T09:59:13.2212041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2212134Z with policy(): 2025-12-04T09:59:13.2212583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2212681Z raise RuntimeError(msg) 2025-12-04T09:59:13.2213794Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 720306176 and is now 758054912. 2025-12-04T09:59:13.2213802Z 2025-12-04T09:59:13.2213990Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2214595Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2214600Z 2025-12-04T09:59:13.2214831Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2215001Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.2215166Z ====================== 1 failed, 26 deselected in 30.76s ======================= 2025-12-04T09:59:13.2215250Z Got exit code 1 2025-12-04T09:59:13.2215814Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda 2025-12-04T09:59:13.2216175Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.2216982Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c7bc1bec56d6360.xml 2025-12-04T09:59:13.2217163Z ============================= test session starts ============================== 2025-12-04T09:59:13.2217508Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.2217625Z cachedir: .pytest_cache 2025-12-04T09:59:13.2218139Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.2218299Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.2218416Z configfile: pytest.ini 2025-12-04T09:59:13.2218952Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.2219174Z collecting ... collected 60 items / 4 deselected / 56 selected 2025-12-04T09:59:13.2219344Z stepcurrent: skipping 4 already run items. 2025-12-04T09:59:13.2219456Z Running 23 items in this shard 2025-12-04T09:59:13.2219461Z 2025-12-04T09:59:13.2220547Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda I1204 09:32:35.974000 45540 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 45592 2025-12-04T09:59:13.2221269Z I1204 09:32:35.975000 45540 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 45593 2025-12-04T09:59:13.2221777Z I1204 09:32:35.975000 45540 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 45594 2025-12-04T09:59:13.2222273Z I1204 09:32:35.976000 45540 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 45595 2025-12-04T09:59:13.2224306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2224420Z _warn_cpu_init() 2025-12-04T09:59:13.2226428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2226611Z _warn_cpu_init() 2025-12-04T09:59:13.2228626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2228733Z _warn_cpu_init() 2025-12-04T09:59:13.2230776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2230892Z _warn_cpu_init() 2025-12-04T09:59:13.2231895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.2232006Z return func(*args, **kwargs) 2025-12-04T09:59:13.2232586Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2233112Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2234133Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2234627Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2235632Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2236025Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2236954Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2237440Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2238372Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2238856Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2239787Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2240229Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2241171Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2241689Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2243407Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912. 2025-12-04T09:59:13.2243753Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2244393Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2245521Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2245878Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2246547Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2247059Z [rank0]:E1204 09:33:03.240000 45592 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.2247493Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2248022Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2248975Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2249477Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2250416Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2250794Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2251701Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2252173Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2253081Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2253729Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2254658Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2255105Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2256066Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2256625Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2258493Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.2258866Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2259646Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2260808Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2261182Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2261892Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2262467Z [rank1]:E1204 09:33:03.243000 45593 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.2262933Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2263493Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2264510Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2265015Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2266018Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2266423Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2267387Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2267884Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2268842Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2269409Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2270298Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2270705Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2271561Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2271995Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2273525Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T09:59:13.2273854Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2274452Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2275484Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2275816Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2276477Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2276961Z [rank2]:E1204 09:33:03.243000 45594 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.2277393Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2277862Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2278760Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2279211Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2280101Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2280456Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2281304Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2281750Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2282613Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2283053Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2283923Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2284332Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2285181Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2285612Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2287146Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T09:59:13.2287472Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2288064Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2289098Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2289457Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2290089Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2290610Z [rank3]:E1204 09:33:03.243000 45595 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.2290711Z dist init r=1, world=4 2025-12-04T09:59:13.2290798Z dist init r=2, world=4 2025-12-04T09:59:13.2290887Z dist init r=3, world=4 2025-12-04T09:59:13.2290981Z dist init r=0, world=4 2025-12-04T09:59:13.2292002Z [rank0]:[W1204 09:33:03.268184081 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.2292103Z FAILED [28.9438s] [ 4%] 2025-12-04T09:59:13.2292108Z 2025-12-04T09:59:13.2292239Z =================================== FAILURES =================================== 2025-12-04T09:59:13.2292534Z _ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda _ 2025-12-04T09:59:13.2292648Z Traceback (most recent call last): 2025-12-04T09:59:13.2293143Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.2293249Z self._join_processes(fn) 2025-12-04T09:59:13.2293766Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.2293891Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.2294436Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.2294541Z raise RuntimeError(error) 2025-12-04T09:59:13.2294762Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.2294868Z Traceback (most recent call last): 2025-12-04T09:59:13.2295369Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2295478Z getattr(self, test_name)() 2025-12-04T09:59:13.2295959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2296037Z fn() 2025-12-04T09:59:13.2296575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2296833Z method(*args, **kwargs) 2025-12-04T09:59:13.2297358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2297461Z method(*args, **kwargs) 2025-12-04T09:59:13.2297997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2298106Z with policy(): 2025-12-04T09:59:13.2298617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2298728Z raise RuntimeError(msg) 2025-12-04T09:59:13.2299975Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T09:59:13.2299981Z 2025-12-04T09:59:13.2300193Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2300946Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2300951Z 2025-12-04T09:59:13.2301216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2301221Z 2025-12-04T09:59:13.2301396Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.2301549Z Traceback (most recent call last): 2025-12-04T09:59:13.2302100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2302220Z getattr(self, test_name)() 2025-12-04T09:59:13.2302754Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2302841Z fn() 2025-12-04T09:59:13.2303356Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2303460Z method(*args, **kwargs) 2025-12-04T09:59:13.2303972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2304078Z method(*args, **kwargs) 2025-12-04T09:59:13.2304580Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2304687Z with policy(): 2025-12-04T09:59:13.2305195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2305303Z raise RuntimeError(msg) 2025-12-04T09:59:13.2306547Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T09:59:13.2306555Z 2025-12-04T09:59:13.2306773Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2307494Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2307527Z 2025-12-04T09:59:13.2307795Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2307801Z 2025-12-04T09:59:13.2307805Z 2025-12-04T09:59:13.2308032Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.2308290Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.2309288Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c7bc1bec56d6360.xml - 2025-12-04T09:59:13.2309447Z =========================== short test summary info ============================ 2025-12-04T09:59:13.2310255Z FAILED [28.9438s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.2310374Z Traceback (most recent call last): 2025-12-04T09:59:13.2310864Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2310966Z getattr(self, test_name)() 2025-12-04T09:59:13.2311457Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2311537Z fn() 2025-12-04T09:59:13.2311995Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2312086Z method(*args, **kwargs) 2025-12-04T09:59:13.2312537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2312666Z method(*args, **kwargs) 2025-12-04T09:59:13.2313113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2313200Z with policy(): 2025-12-04T09:59:13.2313663Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2313790Z raise RuntimeError(msg) 2025-12-04T09:59:13.2314899Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T09:59:13.2314904Z 2025-12-04T09:59:13.2315097Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2315727Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2315745Z 2025-12-04T09:59:13.2315982Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2315986Z 2025-12-04T09:59:13.2316136Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.2316255Z Traceback (most recent call last): 2025-12-04T09:59:13.2316743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2316842Z getattr(self, test_name)() 2025-12-04T09:59:13.2317326Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2317405Z fn() 2025-12-04T09:59:13.2317867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2317962Z method(*args, **kwargs) 2025-12-04T09:59:13.2318411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2318516Z method(*args, **kwargs) 2025-12-04T09:59:13.2318993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2319084Z with policy(): 2025-12-04T09:59:13.2319553Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2319647Z raise RuntimeError(msg) 2025-12-04T09:59:13.2320889Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T09:59:13.2320899Z 2025-12-04T09:59:13.2321098Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2322014Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2322022Z 2025-12-04T09:59:13.2322288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2322468Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.2322657Z ======================= 1 failed, 4 deselected in 29.17s ======================= 2025-12-04T09:59:13.2322756Z Got exit code 1 2025-12-04T09:59:13.2322858Z Retrying single test... 2025-12-04T09:59:13.2323490Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1003ee713f2c1e3e.xml 2025-12-04T09:59:13.2323652Z ============================= test session starts ============================== 2025-12-04T09:59:13.2324052Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.2324160Z cachedir: .pytest_cache 2025-12-04T09:59:13.2324677Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.2324844Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.2324953Z configfile: pytest.ini 2025-12-04T09:59:13.2325492Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.2325716Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.2326510Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2326633Z Running 1 items in this shard 2025-12-04T09:59:13.2326638Z 2025-12-04T09:59:13.2327708Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda I1204 09:33:09.414000 45877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 45929 2025-12-04T09:59:13.2328215Z I1204 09:33:09.415000 45877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 45930 2025-12-04T09:59:13.2328711Z I1204 09:33:09.415000 45877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 45931 2025-12-04T09:59:13.2329200Z I1204 09:33:09.416000 45877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 45932 2025-12-04T09:59:13.2331236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2331337Z _warn_cpu_init() 2025-12-04T09:59:13.2333391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2333492Z _warn_cpu_init() 2025-12-04T09:59:13.2335462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2335556Z _warn_cpu_init() 2025-12-04T09:59:13.2337672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2337772Z _warn_cpu_init() 2025-12-04T09:59:13.2338774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.2338949Z return func(*args, **kwargs) 2025-12-04T09:59:13.2339414Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2339991Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2340993Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2341509Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2342497Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2342901Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2343870Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2344355Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2345322Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2345808Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2346813Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2347263Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2348224Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2348724Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2350469Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912. 2025-12-04T09:59:13.2350831Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2351451Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2352546Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2352888Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2353589Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2354110Z [rank0]:E1204 09:33:35.550000 45929 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.2354566Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2355070Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2356098Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2356562Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2357443Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2357801Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2358657Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2359090Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2359950Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2360384Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2361272Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2361674Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2362531Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2362976Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2364502Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.2364836Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2365419Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2366457Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2366804Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2367437Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2367956Z [rank1]:E1204 09:33:35.551000 45930 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.2368357Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2368835Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2369723Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2370189Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2371064Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2371414Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2372270Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2372701Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2373591Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2374024Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2374872Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2375277Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2376129Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2376680Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2378526Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T09:59:13.2378909Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2379569Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2380770Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2381169Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2381882Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2382436Z [rank2]:E1204 09:33:35.551000 45931 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.2382886Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2383422Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2384431Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2384944Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2385939Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2386333Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2387302Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2387817Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2388901Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2389473Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2390321Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2390727Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2391624Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2392068Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2393560Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T09:59:13.2393919Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2394507Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2395540Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2395893Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2396525Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2397011Z [rank3]:E1204 09:33:35.552000 45932 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.2397102Z dist init r=0, world=4 2025-12-04T09:59:13.2397199Z dist init r=3, world=4 2025-12-04T09:59:13.2397285Z dist init r=2, world=4 2025-12-04T09:59:13.2397370Z dist init r=1, world=4 2025-12-04T09:59:13.2398405Z [rank0]:[W1204 09:33:35.567190403 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.2398495Z FAILED [27.9448s] [100%] 2025-12-04T09:59:13.2398500Z 2025-12-04T09:59:13.2398631Z =================================== FAILURES =================================== 2025-12-04T09:59:13.2398932Z _ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda _ 2025-12-04T09:59:13.2399038Z Traceback (most recent call last): 2025-12-04T09:59:13.2399536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.2399637Z self._join_processes(fn) 2025-12-04T09:59:13.2400159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.2400324Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.2400866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.2400979Z raise RuntimeError(error) 2025-12-04T09:59:13.2401185Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.2401294Z Traceback (most recent call last): 2025-12-04T09:59:13.2401779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2401879Z getattr(self, test_name)() 2025-12-04T09:59:13.2402355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2402445Z fn() 2025-12-04T09:59:13.2402919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2403023Z method(*args, **kwargs) 2025-12-04T09:59:13.2403473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2403569Z method(*args, **kwargs) 2025-12-04T09:59:13.2404023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2404109Z with policy(): 2025-12-04T09:59:13.2404561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2404664Z raise RuntimeError(msg) 2025-12-04T09:59:13.2405789Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T09:59:13.2405796Z 2025-12-04T09:59:13.2406031Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2406661Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2406666Z 2025-12-04T09:59:13.2406906Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2406911Z 2025-12-04T09:59:13.2406915Z 2025-12-04T09:59:13.2407112Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.2407346Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.2408077Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1003ee713f2c1e3e.xml - 2025-12-04T09:59:13.2408232Z =========================== short test summary info ============================ 2025-12-04T09:59:13.2409016Z FAILED [27.9448s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.2409124Z Traceback (most recent call last): 2025-12-04T09:59:13.2409614Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2409722Z getattr(self, test_name)() 2025-12-04T09:59:13.2410198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2410289Z fn() 2025-12-04T09:59:13.2410737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2410828Z method(*args, **kwargs) 2025-12-04T09:59:13.2411312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2411408Z method(*args, **kwargs) 2025-12-04T09:59:13.2411855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2411956Z with policy(): 2025-12-04T09:59:13.2412406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2412512Z raise RuntimeError(msg) 2025-12-04T09:59:13.2413607Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T09:59:13.2413615Z 2025-12-04T09:59:13.2413830Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2414480Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2414486Z 2025-12-04T09:59:13.2414721Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2414888Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.2415047Z ====================== 1 failed, 26 deselected in 28.17s ======================= 2025-12-04T09:59:13.2415136Z Got exit code 1 2025-12-04T09:59:13.2415237Z Retrying single test... 2025-12-04T09:59:13.2415786Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-86ef8482fc5a0e9d.xml 2025-12-04T09:59:13.2415967Z ============================= test session starts ============================== 2025-12-04T09:59:13.2416336Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.2416451Z cachedir: .pytest_cache 2025-12-04T09:59:13.2417155Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.2417279Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.2417386Z configfile: pytest.ini 2025-12-04T09:59:13.2417928Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.2418142Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.2418939Z stepcurrent: skipping 4 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2419051Z Running 1 items in this shard 2025-12-04T09:59:13.2419056Z 2025-12-04T09:59:13.2420117Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda I1204 09:33:42.024000 46214 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 46266 2025-12-04T09:59:13.2420632Z I1204 09:33:42.024000 46214 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 46267 2025-12-04T09:59:13.2421348Z I1204 09:33:42.025000 46214 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 46268 2025-12-04T09:59:13.2421848Z I1204 09:33:42.026000 46214 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 46269 2025-12-04T09:59:13.2424306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2424422Z _warn_cpu_init() 2025-12-04T09:59:13.2426429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2426536Z _warn_cpu_init() 2025-12-04T09:59:13.2428581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2428686Z _warn_cpu_init() 2025-12-04T09:59:13.2430702Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2430840Z _warn_cpu_init() 2025-12-04T09:59:13.2431846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.2431957Z return func(*args, **kwargs) 2025-12-04T09:59:13.2432588Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2433194Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2434078Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2434535Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2435417Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2435782Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2436634Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2437071Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2437922Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2438357Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2439254Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2439654Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2440512Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2440945Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2442481Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912. 2025-12-04T09:59:13.2442810Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2443393Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2444432Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2444779Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2445425Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2445936Z [rank0]:E1204 09:34:09.429000 46266 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.2446341Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2446812Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2447696Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2448154Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2449029Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2449392Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2450241Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2450685Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2451543Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2452007Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2452869Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2453263Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2454129Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2454591Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2456096Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T09:59:13.2456495Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2457303Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2458524Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2458892Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2459647Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2460187Z [rank1]:E1204 09:34:09.431000 46267 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.2460651Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2461183Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2462188Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2462703Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2463692Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2464097Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2465060Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2465560Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2466614Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2467101Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2468071Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2468516Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2469671Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2470110Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2471620Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T09:59:13.2471941Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2472550Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2473602Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2473950Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2474595Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2475078Z [rank3]:E1204 09:34:09.432000 46269 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.2475487Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2475961Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2476849Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2477311Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2478188Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2478545Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2479423Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2479871Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2480722Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2481157Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2482011Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2482431Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2483294Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2483731Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2485246Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.2485599Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2486193Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2487268Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2487590Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2488234Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2488716Z [rank2]:E1204 09:34:09.432000 46268 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.2488815Z dist init r=0, world=4 2025-12-04T09:59:13.2488903Z dist init r=2, world=4 2025-12-04T09:59:13.2488992Z dist init r=1, world=4 2025-12-04T09:59:13.2489088Z dist init r=3, world=4 2025-12-04T09:59:13.2490116Z [rank0]:[W1204 09:34:09.443226628 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.2490205Z FAILED [28.9137s] [100%] 2025-12-04T09:59:13.2490210Z 2025-12-04T09:59:13.2490347Z =================================== FAILURES =================================== 2025-12-04T09:59:13.2490636Z _ TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda _ 2025-12-04T09:59:13.2490752Z Traceback (most recent call last): 2025-12-04T09:59:13.2491239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.2491336Z self._join_processes(fn) 2025-12-04T09:59:13.2491889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.2492020Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.2492558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.2492669Z raise RuntimeError(error) 2025-12-04T09:59:13.2492877Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.2492990Z Traceback (most recent call last): 2025-12-04T09:59:13.2493472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2493571Z getattr(self, test_name)() 2025-12-04T09:59:13.2494079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2494161Z fn() 2025-12-04T09:59:13.2494612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2494714Z method(*args, **kwargs) 2025-12-04T09:59:13.2495158Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2495255Z method(*args, **kwargs) 2025-12-04T09:59:13.2495700Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2495790Z with policy(): 2025-12-04T09:59:13.2496248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2496446Z raise RuntimeError(msg) 2025-12-04T09:59:13.2497841Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T09:59:13.2497885Z 2025-12-04T09:59:13.2498098Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2498806Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2498818Z 2025-12-04T09:59:13.2514014Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2514032Z 2025-12-04T09:59:13.2514036Z 2025-12-04T09:59:13.2514335Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.2514592Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.2515530Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-86ef8482fc5a0e9d.xml - 2025-12-04T09:59:13.2515710Z =========================== short test summary info ============================ 2025-12-04T09:59:13.2516534Z FAILED [28.9137s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.2516705Z Traceback (most recent call last): 2025-12-04T09:59:13.2517239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2517352Z getattr(self, test_name)() 2025-12-04T09:59:13.2517879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2517968Z fn() 2025-12-04T09:59:13.2518462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2518686Z method(*args, **kwargs) 2025-12-04T09:59:13.2519190Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2519303Z method(*args, **kwargs) 2025-12-04T09:59:13.2519818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2519913Z with policy(): 2025-12-04T09:59:13.2520399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2520504Z raise RuntimeError(msg) 2025-12-04T09:59:13.2522663Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T09:59:13.2522674Z 2025-12-04T09:59:13.2522900Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2523630Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2523636Z 2025-12-04T09:59:13.2523905Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2524086Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.2524274Z ====================== 1 failed, 26 deselected in 29.13s ======================= 2025-12-04T09:59:13.2524376Z Got exit code 1 2025-12-04T09:59:13.2525064Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.2525481Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.2526102Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e9238188d8477a2.xml 2025-12-04T09:59:13.2526322Z ============================= test session starts ============================== 2025-12-04T09:59:13.2526677Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.2526785Z cachedir: .pytest_cache 2025-12-04T09:59:13.2527308Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.2527435Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.2527549Z configfile: pytest.ini 2025-12-04T09:59:13.2528087Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.2528302Z collecting ... collected 60 items / 5 deselected / 55 selected 2025-12-04T09:59:13.2528452Z stepcurrent: skipping 5 already run items. 2025-12-04T09:59:13.2528569Z Running 22 items in this shard 2025-12-04T09:59:13.2528575Z 2025-12-04T09:59:13.2529618Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda I1204 09:34:15.614000 46551 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 46603 2025-12-04T09:59:13.2530117Z I1204 09:34:15.615000 46551 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 46604 2025-12-04T09:59:13.2530612Z I1204 09:34:15.615000 46551 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 46605 2025-12-04T09:59:13.2531116Z I1204 09:34:15.616000 46551 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 46606 2025-12-04T09:59:13.2533237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2533355Z _warn_cpu_init() 2025-12-04T09:59:13.2535621Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2535794Z _warn_cpu_init() 2025-12-04T09:59:13.2542293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2542410Z _warn_cpu_init() 2025-12-04T09:59:13.2544432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2544565Z _warn_cpu_init() 2025-12-04T09:59:13.2545583Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.2545732Z return func(*args, **kwargs) 2025-12-04T09:59:13.2546195Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2546741Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2547742Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2548368Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2549339Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2549730Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2550667Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2551135Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2552104Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2552579Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2553516Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2553947Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2554889Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2555396Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2556996Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T09:59:13.2557361Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2557999Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2559120Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2559473Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2560253Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2560792Z [rank0]:E1204 09:34:46.167000 46603 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.2561359Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2561906Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2562857Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2563341Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2564266Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2564647Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2565559Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2566017Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2566971Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2567430Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2568331Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2568749Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2569698Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2570163Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2571704Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T09:59:13.2572054Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2572721Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2573779Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2574151Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2574823Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2575345Z [rank1]:E1204 09:34:46.168000 46604 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.2575773Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2576346Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2577509Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2578029Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2579017Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2579419Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2580432Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2580924Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2581894Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2582379Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2583343Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2583818Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2584786Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2585287Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2586940Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T09:59:13.2587351Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2588018Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2589263Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2589590Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2590226Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2590718Z [rank3]:E1204 09:34:46.170000 46606 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.2591118Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2591605Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2592493Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2592959Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2593841Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2594198Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2595090Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2595524Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2596391Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2596828Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2597720Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2598116Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2598972Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2599423Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2600890Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T09:59:13.2601280Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2601863Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2602868Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2603190Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2603832Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2604329Z [rank2]:E1204 09:34:46.170000 46605 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.2604424Z dist init r=0, world=4 2025-12-04T09:59:13.2604525Z dist init r=1, world=4 2025-12-04T09:59:13.2604619Z dist init r=3, world=4 2025-12-04T09:59:13.2604706Z dist init r=2, world=4 2025-12-04T09:59:13.2605746Z [rank0]:[W1204 09:34:46.187451248 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.2605838Z FAILED [32.7218s] [ 4%] 2025-12-04T09:59:13.2605844Z 2025-12-04T09:59:13.2605987Z =================================== FAILURES =================================== 2025-12-04T09:59:13.2606259Z _____ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda _____ 2025-12-04T09:59:13.2606370Z Traceback (most recent call last): 2025-12-04T09:59:13.2606900Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.2607004Z self._join_processes(fn) 2025-12-04T09:59:13.2607524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.2607659Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.2608192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.2608301Z raise RuntimeError(error) 2025-12-04T09:59:13.2608508Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.2608617Z Traceback (most recent call last): 2025-12-04T09:59:13.2609131Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2609232Z getattr(self, test_name)() 2025-12-04T09:59:13.2609705Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2609796Z fn() 2025-12-04T09:59:13.2610244Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2610346Z method(*args, **kwargs) 2025-12-04T09:59:13.2610800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2610893Z method(*args, **kwargs) 2025-12-04T09:59:13.2611350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2611462Z with policy(): 2025-12-04T09:59:13.2611913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2612021Z raise RuntimeError(msg) 2025-12-04T09:59:13.2613086Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T09:59:13.2613121Z 2025-12-04T09:59:13.2613328Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2613921Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2613926Z 2025-12-04T09:59:13.2614173Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2614181Z 2025-12-04T09:59:13.2614324Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.2614430Z Traceback (most recent call last): 2025-12-04T09:59:13.2614930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2615031Z getattr(self, test_name)() 2025-12-04T09:59:13.2615516Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2615593Z fn() 2025-12-04T09:59:13.2616040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2616144Z method(*args, **kwargs) 2025-12-04T09:59:13.2616836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2616952Z method(*args, **kwargs) 2025-12-04T09:59:13.2617464Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2617561Z with policy(): 2025-12-04T09:59:13.2618117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2618232Z raise RuntimeError(msg) 2025-12-04T09:59:13.2619424Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T09:59:13.2619430Z 2025-12-04T09:59:13.2619654Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2620319Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2620327Z 2025-12-04T09:59:13.2620603Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2620609Z 2025-12-04T09:59:13.2620655Z 2025-12-04T09:59:13.2621125Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.2621399Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.2622215Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e9238188d8477a2.xml - 2025-12-04T09:59:13.2622389Z =========================== short test summary info ============================ 2025-12-04T09:59:13.2623232Z FAILED [32.7218s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.2623356Z Traceback (most recent call last): 2025-12-04T09:59:13.2623978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2624103Z getattr(self, test_name)() 2025-12-04T09:59:13.2624646Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2624789Z fn() 2025-12-04T09:59:13.2625300Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2625404Z method(*args, **kwargs) 2025-12-04T09:59:13.2625912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2626014Z method(*args, **kwargs) 2025-12-04T09:59:13.2626520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2626630Z with policy(): 2025-12-04T09:59:13.2627139Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2627260Z raise RuntimeError(msg) 2025-12-04T09:59:13.2628459Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T09:59:13.2628467Z 2025-12-04T09:59:13.2628694Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2629360Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2629365Z 2025-12-04T09:59:13.2629630Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2629638Z 2025-12-04T09:59:13.2629819Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.2629940Z Traceback (most recent call last): 2025-12-04T09:59:13.2630497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2630647Z getattr(self, test_name)() 2025-12-04T09:59:13.2631188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2631294Z fn() 2025-12-04T09:59:13.2631800Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2631905Z method(*args, **kwargs) 2025-12-04T09:59:13.2632428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2632534Z method(*args, **kwargs) 2025-12-04T09:59:13.2633145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2633244Z with policy(): 2025-12-04T09:59:13.2633923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2634053Z raise RuntimeError(msg) 2025-12-04T09:59:13.2635205Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T09:59:13.2635210Z 2025-12-04T09:59:13.2635435Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2636083Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2636118Z 2025-12-04T09:59:13.2636377Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2636567Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.2636745Z ======================= 1 failed, 5 deselected in 32.94s ======================= 2025-12-04T09:59:13.2636880Z Got exit code 1 2025-12-04T09:59:13.2636985Z Retrying single test... 2025-12-04T09:59:13.2637587Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9476e56094f0b738.xml 2025-12-04T09:59:13.2637758Z ============================= test session starts ============================== 2025-12-04T09:59:13.2638097Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.2638201Z cachedir: .pytest_cache 2025-12-04T09:59:13.2638715Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.2638833Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.2638953Z configfile: pytest.ini 2025-12-04T09:59:13.2639473Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.2639680Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.2640420Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2640528Z Running 1 items in this shard 2025-12-04T09:59:13.2640533Z 2025-12-04T09:59:13.2641530Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda I1204 09:34:52.924000 46888 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 46940 2025-12-04T09:59:13.2642013Z I1204 09:34:52.925000 46888 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 46941 2025-12-04T09:59:13.2642496Z I1204 09:34:52.925000 46888 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 46942 2025-12-04T09:59:13.2643003Z I1204 09:34:52.926000 46888 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 46943 2025-12-04T09:59:13.2644978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2645086Z _warn_cpu_init() 2025-12-04T09:59:13.2647143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2647253Z _warn_cpu_init() 2025-12-04T09:59:13.2649144Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2649249Z _warn_cpu_init() 2025-12-04T09:59:13.2651171Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2651299Z _warn_cpu_init() 2025-12-04T09:59:13.2652234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.2652339Z return func(*args, **kwargs) 2025-12-04T09:59:13.2652953Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2653479Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2654459Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2654946Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2655906Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2656398Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2657566Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2658109Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2659070Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2659563Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2660528Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2660974Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2661978Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2662470Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2664130Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T09:59:13.2664524Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2665197Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2666318Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2666716Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2667440Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2667983Z [rank1]:E1204 09:35:23.687000 46941 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.2668655Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2669274Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2670174Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2670628Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2671503Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2671866Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2672768Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2673211Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2674062Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2674501Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2675349Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2675773Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2676645Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2677079Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2678545Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T09:59:13.2678900Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2679489Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2680505Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2680827Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2681468Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2681960Z [rank0]:E1204 09:35:23.687000 46940 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.2682372Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2682844Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2683737Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2684186Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2685064Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2685447Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2686300Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2686737Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2687595Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2688034Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2688904Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2689302Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2690162Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2690596Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2692125Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T09:59:13.2692479Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2693068Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2694053Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2694378Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2695022Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2695502Z [rank3]:E1204 09:35:23.689000 46943 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.2695915Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2696462Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2697610Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2698132Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2699156Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2699566Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2700525Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2701023Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2702009Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2702498Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2703472Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2703914Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2704885Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2705403Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2707061Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T09:59:13.2707453Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2708126Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2709424Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2709755Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2710394Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2710879Z [rank2]:E1204 09:35:23.689000 46942 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.2710980Z dist init r=1, world=4 2025-12-04T09:59:13.2711070Z dist init r=0, world=4 2025-12-04T09:59:13.2711156Z dist init r=2, world=4 2025-12-04T09:59:13.2711250Z dist init r=3, world=4 2025-12-04T09:59:13.2712275Z [rank0]:[W1204 09:35:24.707597608 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.2712374Z FAILED [32.3778s] [100%] 2025-12-04T09:59:13.2712381Z 2025-12-04T09:59:13.2712542Z =================================== FAILURES =================================== 2025-12-04T09:59:13.2712817Z _____ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda _____ 2025-12-04T09:59:13.2712928Z Traceback (most recent call last): 2025-12-04T09:59:13.2713412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.2713509Z self._join_processes(fn) 2025-12-04T09:59:13.2714037Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.2714158Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.2714707Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.2714805Z raise RuntimeError(error) 2025-12-04T09:59:13.2715035Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.2715156Z Traceback (most recent call last): 2025-12-04T09:59:13.2715635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2715734Z getattr(self, test_name)() 2025-12-04T09:59:13.2716211Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2716290Z fn() 2025-12-04T09:59:13.2716743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2716835Z method(*args, **kwargs) 2025-12-04T09:59:13.2717312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2717410Z method(*args, **kwargs) 2025-12-04T09:59:13.2717860Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2717984Z with policy(): 2025-12-04T09:59:13.2718441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2718539Z raise RuntimeError(msg) 2025-12-04T09:59:13.2719602Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T09:59:13.2719608Z 2025-12-04T09:59:13.2719801Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2720404Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2720409Z 2025-12-04T09:59:13.2720643Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2720650Z 2025-12-04T09:59:13.2720656Z 2025-12-04T09:59:13.2720998Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.2721430Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.2722229Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9476e56094f0b738.xml - 2025-12-04T09:59:13.2722408Z =========================== short test summary info ============================ 2025-12-04T09:59:13.2723236Z FAILED [32.3778s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.2723362Z Traceback (most recent call last): 2025-12-04T09:59:13.2723928Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2724107Z getattr(self, test_name)() 2025-12-04T09:59:13.2724653Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2724743Z fn() 2025-12-04T09:59:13.2725253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2725368Z method(*args, **kwargs) 2025-12-04T09:59:13.2725880Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2725985Z method(*args, **kwargs) 2025-12-04T09:59:13.2726508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2726604Z with policy(): 2025-12-04T09:59:13.2727168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2727279Z raise RuntimeError(msg) 2025-12-04T09:59:13.2728477Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T09:59:13.2728483Z 2025-12-04T09:59:13.2728706Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2729366Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2729406Z 2025-12-04T09:59:13.2729680Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2729858Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.2730039Z ====================== 1 failed, 26 deselected in 32.60s ======================= 2025-12-04T09:59:13.2730179Z Got exit code 1 2025-12-04T09:59:13.2730283Z Retrying single test... 2025-12-04T09:59:13.2730909Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-207ff9590d724b3a.xml 2025-12-04T09:59:13.2731071Z ============================= test session starts ============================== 2025-12-04T09:59:13.2731417Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.2731535Z cachedir: .pytest_cache 2025-12-04T09:59:13.2732048Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.2732172Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.2732280Z configfile: pytest.ini 2025-12-04T09:59:13.2732826Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.2733040Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.2733880Z stepcurrent: skipping 5 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2733982Z Running 1 items in this shard 2025-12-04T09:59:13.2733987Z 2025-12-04T09:59:13.2734891Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda I1204 09:35:30.103000 47225 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 47277 2025-12-04T09:59:13.2735338Z I1204 09:35:30.104000 47225 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 47278 2025-12-04T09:59:13.2735780Z I1204 09:35:30.105000 47225 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 47279 2025-12-04T09:59:13.2736252Z I1204 09:35:30.106000 47225 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 47280 2025-12-04T09:59:13.2738491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2738599Z _warn_cpu_init() 2025-12-04T09:59:13.2740637Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2740748Z _warn_cpu_init() 2025-12-04T09:59:13.2742757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2742894Z _warn_cpu_init() 2025-12-04T09:59:13.2744913Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2745040Z _warn_cpu_init() 2025-12-04T09:59:13.2746043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.2746158Z return func(*args, **kwargs) 2025-12-04T09:59:13.2746626Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2747168Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2748175Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2748808Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2749824Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2750187Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2751043Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2751510Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2752358Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2752787Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2753641Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2754040Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2754933Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2755371Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2756847Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 711917568 and is now 734986240. 2025-12-04T09:59:13.2757199Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2757790Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2758787Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2759138Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2759778Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2760265Z [rank0]:E1204 09:36:07.922000 47277 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.2760670Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2761147Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2762043Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2762496Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2763373Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2763734Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2764609Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2765049Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2765898Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2766325Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2767210Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2767609Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2768476Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2768916Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2770379Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T09:59:13.2770743Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2771329Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2772346Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2772669Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2773310Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2773796Z [rank1]:E1204 09:36:07.922000 47278 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.2774203Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2774673Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2775559Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2776013Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2777168Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2777607Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2778575Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2779069Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2780025Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2780514Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2781508Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2781958Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2782931Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2783421Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2785110Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T09:59:13.2785963Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2786624Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2787753Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2788121Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2788840Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2789438Z [rank2]:E1204 09:36:07.922000 47279 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.2789847Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2790315Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2791204Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2791658Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2792565Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2792927Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2793781Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2794221Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2795096Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2795533Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2796394Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2796788Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2797648Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2798107Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2799580Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 3. CUDA driver allocated memory was 583991296 and is now 625934336. 2025-12-04T09:59:13.2799930Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2800514Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2801507Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2801830Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2802475Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2802959Z [rank3]:E1204 09:36:07.922000 47280 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.2803057Z dist init r=2, world=4 2025-12-04T09:59:13.2803142Z dist init r=0, world=4 2025-12-04T09:59:13.2803228Z dist init r=3, world=4 2025-12-04T09:59:13.2803321Z dist init r=1, world=4 2025-12-04T09:59:13.2804348Z [rank0]:[W1204 09:36:08.943788160 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.2804440Z FAILED [39.4702s] [100%] 2025-12-04T09:59:13.2804445Z 2025-12-04T09:59:13.2804607Z =================================== FAILURES =================================== 2025-12-04T09:59:13.2804883Z _____ TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda _____ 2025-12-04T09:59:13.2804998Z Traceback (most recent call last): 2025-12-04T09:59:13.2805480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.2805578Z self._join_processes(fn) 2025-12-04T09:59:13.2806108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.2806235Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.2806781Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.2806883Z raise RuntimeError(error) 2025-12-04T09:59:13.2807112Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.2807236Z Traceback (most recent call last): 2025-12-04T09:59:13.2807715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2807814Z getattr(self, test_name)() 2025-12-04T09:59:13.2808298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2808377Z fn() 2025-12-04T09:59:13.2808833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2808923Z method(*args, **kwargs) 2025-12-04T09:59:13.2809398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2809500Z method(*args, **kwargs) 2025-12-04T09:59:13.2809947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2810057Z with policy(): 2025-12-04T09:59:13.2810519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2810617Z raise RuntimeError(msg) 2025-12-04T09:59:13.2811686Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 711917568 and is now 734986240. 2025-12-04T09:59:13.2811691Z 2025-12-04T09:59:13.2811879Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2812473Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2812488Z 2025-12-04T09:59:13.2812722Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2812730Z 2025-12-04T09:59:13.2812735Z 2025-12-04T09:59:13.2812927Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.2813169Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.2813874Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-207ff9590d724b3a.xml - 2025-12-04T09:59:13.2814035Z =========================== short test summary info ============================ 2025-12-04T09:59:13.2814763Z FAILED [39.4702s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.2814869Z Traceback (most recent call last): 2025-12-04T09:59:13.2815367Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2815488Z getattr(self, test_name)() 2025-12-04T09:59:13.2815963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2816047Z fn() 2025-12-04T09:59:13.2816575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2816853Z method(*args, **kwargs) 2025-12-04T09:59:13.2817517Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2817622Z method(*args, **kwargs) 2025-12-04T09:59:13.2818136Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2818230Z with policy(): 2025-12-04T09:59:13.2818790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2818913Z raise RuntimeError(msg) 2025-12-04T09:59:13.2820104Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 12800 on device 0. CUDA driver allocated memory was 711917568 and is now 734986240. 2025-12-04T09:59:13.2820110Z 2025-12-04T09:59:13.2820335Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2821220Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2821300Z 2025-12-04T09:59:13.2821571Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2821752Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.2821931Z ====================== 1 failed, 26 deselected in 39.69s ======================= 2025-12-04T09:59:13.2822079Z Got exit code 1 2025-12-04T09:59:13.2822670Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda 2025-12-04T09:59:13.2823083Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.2823698Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f664e87214ff2805.xml 2025-12-04T09:59:13.2823859Z ============================= test session starts ============================== 2025-12-04T09:59:13.2824213Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.2824319Z cachedir: .pytest_cache 2025-12-04T09:59:13.2824835Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.2824966Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.2825071Z configfile: pytest.ini 2025-12-04T09:59:13.2825617Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.2825826Z collecting ... collected 60 items / 6 deselected / 54 selected 2025-12-04T09:59:13.2825965Z stepcurrent: skipping 6 already run items. 2025-12-04T09:59:13.2826086Z Running 21 items in this shard 2025-12-04T09:59:13.2826092Z 2025-12-04T09:59:13.2827127Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda I1204 09:36:14.343000 47562 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 47614 2025-12-04T09:59:13.2827636Z I1204 09:36:14.344000 47562 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 47615 2025-12-04T09:59:13.2828165Z I1204 09:36:14.345000 47562 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 47616 2025-12-04T09:59:13.2828660Z I1204 09:36:14.346000 47562 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 47617 2025-12-04T09:59:13.2830692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2830793Z _warn_cpu_init() 2025-12-04T09:59:13.2832980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2833080Z _warn_cpu_init() 2025-12-04T09:59:13.2835040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2835161Z _warn_cpu_init() 2025-12-04T09:59:13.2836163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.2836290Z _init_core_state( 2025-12-04T09:59:13.2837953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.2838126Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.2839115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.2839226Z _init_core_state( 2025-12-04T09:59:13.2840893Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.2841062Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.2842050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.2842145Z _init_core_state( 2025-12-04T09:59:13.2843840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.2844004Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.2845963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2846063Z _warn_cpu_init() 2025-12-04T09:59:13.2847079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.2847177Z _init_core_state( 2025-12-04T09:59:13.2848837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.2849002Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.2850652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.2850847Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.2852530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.2852695Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.2854348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.2854518Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.2859243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.2859653Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.2864169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.2864576Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.2869348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.2869744Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.2873729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.2874081Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.2874771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.2874873Z return func(*args, **kwargs) 2025-12-04T09:59:13.2875601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.2875698Z return func(*args, **kwargs) 2025-12-04T09:59:13.2876370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.2876480Z return func(*args, **kwargs) 2025-12-04T09:59:13.2877157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.2877260Z return func(*args, **kwargs) 2025-12-04T09:59:13.2877930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.2878025Z return func(*args, **kwargs) 2025-12-04T09:59:13.2878733Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.2878830Z return func(*args, **kwargs) 2025-12-04T09:59:13.2879514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.2879609Z return func(*args, **kwargs) 2025-12-04T09:59:13.2880279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.2880383Z return func(*args, **kwargs) 2025-12-04T09:59:13.2881287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.2881397Z return func(*args, **kwargs) 2025-12-04T09:59:13.2881804Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2882303Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2883193Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2883643Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2884537Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2884889Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2885752Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2886184Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2887032Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2887474Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2888342Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2888749Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2889605Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2890048Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2891546Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 720306176 and is now 10532880384. 2025-12-04T09:59:13.2891876Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2892467Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2893461Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.2893815Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2894446Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2894962Z [rank0]:E1204 09:36:24.223000 47614 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.2895360Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2895832Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2896978Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2897493Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2898493Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2898889Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2899857Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2900342Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2901299Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2901827Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2902791Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2903250Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2904212Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2904776Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2906445Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T09:59:13.2906813Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2907482Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2908613Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.2909132Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2909794Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2910286Z [rank2]:E1204 09:36:24.226000 47616 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.2910682Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2911153Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2912050Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2912497Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2913373Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2913721Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2914590Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2915022Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2915897Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2916336Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2917185Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2917774Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2918710Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2919187Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2920908Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T09:59:13.2921436Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2922155Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2923281Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.2923717Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2924434Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2924992Z [rank1]:E1204 09:36:24.226000 47615 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.2925442Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.2925978Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.2926991Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2927494Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.2928490Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2928884Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.2929847Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2930374Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2931344Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2931835Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.2932799Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2933288Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.2934323Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2934796Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.2936430Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T09:59:13.2937002Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2937687Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2938840Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.2939213Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.2939923Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2940482Z [rank3]:E1204 09:36:24.226000 47617 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.2940581Z dist init r=3, world=4 2025-12-04T09:59:13.2940682Z dist init r=0, world=4 2025-12-04T09:59:13.2940796Z dist init r=2, world=4 2025-12-04T09:59:13.2940893Z dist init r=1, world=4 2025-12-04T09:59:13.2942047Z [rank0]:[W1204 09:36:24.244897733 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.2943205Z [rank3]:[W1204 09:36:24.245153369 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.2944345Z [rank2]:[W1204 09:36:24.248749742 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.2945516Z [rank1]:[W1204 09:36:24.253210955 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.2945622Z FAILED [27.1464s] [ 4%] 2025-12-04T09:59:13.2945629Z 2025-12-04T09:59:13.2945788Z =================================== FAILURES =================================== 2025-12-04T09:59:13.2946093Z ____ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda _____ 2025-12-04T09:59:13.2946210Z Traceback (most recent call last): 2025-12-04T09:59:13.2946767Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.2946878Z self._join_processes(fn) 2025-12-04T09:59:13.2947503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.2947645Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.2948256Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.2948376Z raise RuntimeError(error) 2025-12-04T09:59:13.2948722Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.2948843Z Traceback (most recent call last): 2025-12-04T09:59:13.2949479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2949586Z getattr(self, test_name)() 2025-12-04T09:59:13.2950102Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2950213Z fn() 2025-12-04T09:59:13.2950694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2950802Z method(*args, **kwargs) 2025-12-04T09:59:13.2951299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2951398Z method(*args, **kwargs) 2025-12-04T09:59:13.2951876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2951966Z with policy(): 2025-12-04T09:59:13.2952449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2952550Z raise RuntimeError(msg) 2025-12-04T09:59:13.2953690Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T09:59:13.2953698Z 2025-12-04T09:59:13.2953910Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2954542Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.2954547Z 2025-12-04T09:59:13.2954916Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2954921Z 2025-12-04T09:59:13.2954925Z 2025-12-04T09:59:13.2955117Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.2955355Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.2956061Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f664e87214ff2805.xml - 2025-12-04T09:59:13.2956215Z =========================== short test summary info ============================ 2025-12-04T09:59:13.2956993Z FAILED [27.1464s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.2957112Z Traceback (most recent call last): 2025-12-04T09:59:13.2957609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.2957706Z getattr(self, test_name)() 2025-12-04T09:59:13.2958183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.2958275Z fn() 2025-12-04T09:59:13.2958726Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2958823Z method(*args, **kwargs) 2025-12-04T09:59:13.2959306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.2959406Z method(*args, **kwargs) 2025-12-04T09:59:13.2959868Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.2959952Z with policy(): 2025-12-04T09:59:13.2960401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.2960508Z raise RuntimeError(msg) 2025-12-04T09:59:13.2961582Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T09:59:13.2961612Z 2025-12-04T09:59:13.2961819Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.2962421Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.2962461Z 2025-12-04T09:59:13.2962701Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.2962877Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.2963034Z ======================= 1 failed, 6 deselected in 27.36s ======================= 2025-12-04T09:59:13.2963133Z Got exit code 1 2025-12-04T09:59:13.2963226Z Retrying single test... 2025-12-04T09:59:13.2963775Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-def950b7d24ceea9.xml 2025-12-04T09:59:13.2963930Z ============================= test session starts ============================== 2025-12-04T09:59:13.2964244Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.2964343Z cachedir: .pytest_cache 2025-12-04T09:59:13.2964811Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.2964922Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.2965030Z configfile: pytest.ini 2025-12-04T09:59:13.2965503Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.2965694Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.2966366Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.2966465Z Running 1 items in this shard 2025-12-04T09:59:13.2966471Z 2025-12-04T09:59:13.2967395Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda I1204 09:36:46.303000 48667 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 48719 2025-12-04T09:59:13.2967861Z I1204 09:36:46.304000 48667 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 48720 2025-12-04T09:59:13.2968298Z I1204 09:36:46.305000 48667 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 48721 2025-12-04T09:59:13.2968743Z I1204 09:36:46.306000 48667 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 48722 2025-12-04T09:59:13.2970564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2970671Z _warn_cpu_init() 2025-12-04T09:59:13.2972468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2972571Z _warn_cpu_init() 2025-12-04T09:59:13.2974353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2974511Z _warn_cpu_init() 2025-12-04T09:59:13.2975415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.2975501Z _init_core_state( 2025-12-04T09:59:13.2976485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.2976577Z _init_core_state( 2025-12-04T09:59:13.2978458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.2978628Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.2980342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.2980505Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.2981524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.2981633Z _init_core_state( 2025-12-04T09:59:13.2983376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.2983546Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.2985584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.2985697Z _warn_cpu_init() 2025-12-04T09:59:13.2986713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.2986811Z _init_core_state( 2025-12-04T09:59:13.2988533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.2988725Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.2990385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.2990559Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.2992072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.2992218Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.2993729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.2993874Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.2997935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.2998284Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3002301Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3002648Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3006660Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3007065Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3011029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3011378Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3012086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3012196Z return func(*args, **kwargs) 2025-12-04T09:59:13.3012886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3012987Z return func(*args, **kwargs) 2025-12-04T09:59:13.3013658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3013754Z return func(*args, **kwargs) 2025-12-04T09:59:13.3014437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3014537Z return func(*args, **kwargs) 2025-12-04T09:59:13.3015235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3015343Z return func(*args, **kwargs) 2025-12-04T09:59:13.3016011Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3016115Z return func(*args, **kwargs) 2025-12-04T09:59:13.3017046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3017155Z return func(*args, **kwargs) 2025-12-04T09:59:13.3017957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3018067Z return func(*args, **kwargs) 2025-12-04T09:59:13.3019080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.3019219Z return func(*args, **kwargs) 2025-12-04T09:59:13.3019680Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3020228Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3021461Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3021992Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3022986Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3023393Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3024357Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3024842Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3025813Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3026363Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3027334Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3027778Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3028758Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3029300Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3030966Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 611254272 and is now 10421731328. 2025-12-04T09:59:13.3031346Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3032001Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3033252Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.3033583Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3034267Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3034750Z [rank1]:E1204 09:36:56.144000 48720 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.3035152Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3035632Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3036525Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3036984Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3037860Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3038223Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3039076Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3039509Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3040395Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3040826Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3041685Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3042083Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3042971Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3043409Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3044876Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 718209024 and is now 10532880384. 2025-12-04T09:59:13.3045210Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3045820Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3046828Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.3047179Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3047819Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3048302Z [rank0]:E1204 09:36:56.145000 48719 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.3048701Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3049181Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3050066Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3050531Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3051402Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3051759Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3052639Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3053071Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3053932Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3054359Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3055215Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3055634Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3056580Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3057237Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3058904Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 607059968 and is now 10421731328. 2025-12-04T09:59:13.3059315Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3059976Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3061137Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.3061503Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3062225Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3062772Z [rank2]:E1204 09:36:56.145000 48721 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.3063230Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3063773Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3064770Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3065278Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3066269Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3066668Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3067667Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3068159Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3069205Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3069640Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3070533Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3070928Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3071787Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3072230Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3073715Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T09:59:13.3074075Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3074686Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3075686Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.3076006Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3076653Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3077135Z [rank3]:E1204 09:36:56.145000 48722 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.3077229Z dist init r=1, world=4 2025-12-04T09:59:13.3077327Z dist init r=0, world=4 2025-12-04T09:59:13.3077414Z dist init r=3, world=4 2025-12-04T09:59:13.3077499Z dist init r=2, world=4 2025-12-04T09:59:13.3078536Z [rank1]:[W1204 09:36:56.161187001 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3079552Z [rank0]:[W1204 09:36:56.164696242 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3080615Z [rank3]:[W1204 09:36:56.164914126 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3081629Z [rank2]:[W1204 09:36:56.167377245 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3081736Z FAILED [26.9482s] [100%] 2025-12-04T09:59:13.3081742Z 2025-12-04T09:59:13.3081871Z =================================== FAILURES =================================== 2025-12-04T09:59:13.3082144Z ____ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda _____ 2025-12-04T09:59:13.3082270Z Traceback (most recent call last): 2025-12-04T09:59:13.3082782Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.3082899Z self._join_processes(fn) 2025-12-04T09:59:13.3083419Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.3083545Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.3084095Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.3084195Z raise RuntimeError(error) 2025-12-04T09:59:13.3084404Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.3084527Z Traceback (most recent call last): 2025-12-04T09:59:13.3085007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3085141Z getattr(self, test_name)() 2025-12-04T09:59:13.3085622Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3085730Z fn() 2025-12-04T09:59:13.3086193Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3086286Z method(*args, **kwargs) 2025-12-04T09:59:13.3086732Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3086833Z method(*args, **kwargs) 2025-12-04T09:59:13.3087282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3087374Z with policy(): 2025-12-04T09:59:13.3087826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3087922Z raise RuntimeError(msg) 2025-12-04T09:59:13.3089008Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 611254272 and is now 10421731328. 2025-12-04T09:59:13.3089016Z 2025-12-04T09:59:13.3089208Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3089820Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.3089826Z 2025-12-04T09:59:13.3090059Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3090064Z 2025-12-04T09:59:13.3090068Z 2025-12-04T09:59:13.3090275Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.3090506Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.3091247Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-def950b7d24ceea9.xml - 2025-12-04T09:59:13.3091413Z =========================== short test summary info ============================ 2025-12-04T09:59:13.3092153Z FAILED [26.9482s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.3092270Z Traceback (most recent call last): 2025-12-04T09:59:13.3092756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3092853Z getattr(self, test_name)() 2025-12-04T09:59:13.3093339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3093418Z fn() 2025-12-04T09:59:13.3093893Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3093996Z method(*args, **kwargs) 2025-12-04T09:59:13.3094445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3094553Z method(*args, **kwargs) 2025-12-04T09:59:13.3094996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3095081Z with policy(): 2025-12-04T09:59:13.3095536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3095630Z raise RuntimeError(msg) 2025-12-04T09:59:13.3097000Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 611254272 and is now 10421731328. 2025-12-04T09:59:13.3097011Z 2025-12-04T09:59:13.3097265Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3097936Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.3097942Z 2025-12-04T09:59:13.3098213Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3098393Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.3098582Z ====================== 1 failed, 26 deselected in 27.17s ======================= 2025-12-04T09:59:13.3098677Z Got exit code 1 2025-12-04T09:59:13.3098782Z Retrying single test... 2025-12-04T09:59:13.3099410Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-89dfbd7b5cd71317.xml 2025-12-04T09:59:13.3099571Z ============================= test session starts ============================== 2025-12-04T09:59:13.3099920Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.3100037Z cachedir: .pytest_cache 2025-12-04T09:59:13.3100548Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.3100680Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.3100784Z configfile: pytest.ini 2025-12-04T09:59:13.3101316Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.3101543Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.3102296Z stepcurrent: skipping 6 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.3102416Z Running 1 items in this shard 2025-12-04T09:59:13.3102422Z 2025-12-04T09:59:13.3103481Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda I1204 09:37:18.274000 49772 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 49824 2025-12-04T09:59:13.3103982Z I1204 09:37:18.275000 49772 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 49825 2025-12-04T09:59:13.3104480Z I1204 09:37:18.275000 49772 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 49826 2025-12-04T09:59:13.3104968Z I1204 09:37:18.276000 49772 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 49827 2025-12-04T09:59:13.3107042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3107144Z _warn_cpu_init() 2025-12-04T09:59:13.3109272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3109404Z _warn_cpu_init() 2025-12-04T09:59:13.3111378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3111498Z _warn_cpu_init() 2025-12-04T09:59:13.3112497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.3112598Z _init_core_state( 2025-12-04T09:59:13.3113576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.3113681Z _init_core_state( 2025-12-04T09:59:13.3115333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3115494Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3117177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3117336Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3118593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.3118687Z _init_core_state( 2025-12-04T09:59:13.3120227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3120370Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3122703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3122812Z _warn_cpu_init() 2025-12-04T09:59:13.3123828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.3123937Z _init_core_state( 2025-12-04T09:59:13.3125644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3125859Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3127625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3127794Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3129497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3129672Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3131372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3131534Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3136065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3136515Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3141242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3142018Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3146523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3146950Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3151391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3151766Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3152462Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3152560Z return func(*args, **kwargs) 2025-12-04T09:59:13.3153244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3153344Z return func(*args, **kwargs) 2025-12-04T09:59:13.3154018Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3154128Z return func(*args, **kwargs) 2025-12-04T09:59:13.3154829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3154935Z return func(*args, **kwargs) 2025-12-04T09:59:13.3155605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3155696Z return func(*args, **kwargs) 2025-12-04T09:59:13.3156371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3156464Z return func(*args, **kwargs) 2025-12-04T09:59:13.3157157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3157265Z return func(*args, **kwargs) 2025-12-04T09:59:13.3157934Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3158062Z return func(*args, **kwargs) 2025-12-04T09:59:13.3158943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.3159037Z return func(*args, **kwargs) 2025-12-04T09:59:13.3159457Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3159929Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3160827Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3161280Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3162166Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3162515Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3163365Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3163808Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3164680Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3165122Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3165972Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3166373Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3167258Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3167700Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3169189Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T09:59:13.3169511Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3170126Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3171120Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.3171472Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3172106Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3172591Z [rank1]:E1204 09:37:27.971000 49825 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.3172998Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3173469Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3174359Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3174805Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3175687Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3176040Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3177205Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3177705Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3178662Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3179153Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3180111Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3180603Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3181566Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3182058Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3183737Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 720306176 and is now 10532880384. 2025-12-04T09:59:13.3184146Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3184814Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3185969Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.3186340Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3187051Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3187597Z [rank0]:E1204 09:37:27.971000 49824 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.3188058Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3188584Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3189734Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3190186Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3191058Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3191451Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3192305Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3192754Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3193603Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3194042Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3194917Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3195318Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3196184Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3196621Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3198102Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T09:59:13.3198480Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3199074Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3200068Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.3200398Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3201035Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3201519Z [rank2]:E1204 09:37:27.972000 49826 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.3201937Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3202412Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3203313Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3203764Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3204672Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3205037Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3205894Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3206339Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3207189Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3207666Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3208530Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3208928Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3209790Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3210250Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3211746Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 607059968 and is now 10421731328. 2025-12-04T09:59:13.3212097Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3212693Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3213881Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.3214238Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3214915Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3215423Z [rank3]:E1204 09:37:27.972000 49827 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.3215527Z dist init r=2, world=4 2025-12-04T09:59:13.3215627Z dist init r=3, world=4 2025-12-04T09:59:13.3215719Z dist init r=1, world=4 2025-12-04T09:59:13.3215823Z dist init r=0, world=4 2025-12-04T09:59:13.3217171Z [rank2]:[W1204 09:37:28.992693420 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3218375Z [rank3]:[W1204 09:37:28.995444652 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3219522Z [rank1]:[W1204 09:37:28.998231921 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3220671Z [rank0]:[W1204 09:37:28.000930885 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3220987Z FAILED [27.1321s] [100%] 2025-12-04T09:59:13.3220998Z 2025-12-04T09:59:13.3221166Z =================================== FAILURES =================================== 2025-12-04T09:59:13.3221553Z ____ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda _____ 2025-12-04T09:59:13.3221678Z Traceback (most recent call last): 2025-12-04T09:59:13.3222234Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.3222346Z self._join_processes(fn) 2025-12-04T09:59:13.3222932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.3223078Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.3223685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.3223842Z raise RuntimeError(error) 2025-12-04T09:59:13.3224078Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.3224195Z Traceback (most recent call last): 2025-12-04T09:59:13.3224749Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3224898Z getattr(self, test_name)() 2025-12-04T09:59:13.3225429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3225530Z fn() 2025-12-04T09:59:13.3226039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3226150Z method(*args, **kwargs) 2025-12-04T09:59:13.3226655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3226760Z method(*args, **kwargs) 2025-12-04T09:59:13.3227269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3227365Z with policy(): 2025-12-04T09:59:13.3227879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3227997Z raise RuntimeError(msg) 2025-12-04T09:59:13.3229204Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 607059968 and is now 10421731328. 2025-12-04T09:59:13.3229210Z 2025-12-04T09:59:13.3229439Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3230113Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.3230121Z 2025-12-04T09:59:13.3230390Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3230396Z 2025-12-04T09:59:13.3230402Z 2025-12-04T09:59:13.3230661Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.3230929Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.3231746Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-89dfbd7b5cd71317.xml - 2025-12-04T09:59:13.3231913Z =========================== short test summary info ============================ 2025-12-04T09:59:13.3232867Z FAILED [27.1321s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.3233095Z Traceback (most recent call last): 2025-12-04T09:59:13.3233587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3233739Z getattr(self, test_name)() 2025-12-04T09:59:13.3234222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3234305Z fn() 2025-12-04T09:59:13.3234762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3234855Z method(*args, **kwargs) 2025-12-04T09:59:13.3235307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3235398Z method(*args, **kwargs) 2025-12-04T09:59:13.3235845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3235969Z with policy(): 2025-12-04T09:59:13.3236603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3236721Z raise RuntimeError(msg) 2025-12-04T09:59:13.3237859Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 607059968 and is now 10421731328. 2025-12-04T09:59:13.3237895Z 2025-12-04T09:59:13.3238099Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3238746Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.3238751Z 2025-12-04T09:59:13.3239000Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3239184Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.3239353Z ====================== 1 failed, 26 deselected in 27.35s ======================= 2025-12-04T09:59:13.3239448Z Got exit code 1 2025-12-04T09:59:13.3240012Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda 2025-12-04T09:59:13.3240398Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.3240987Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bdae057bafb686b9.xml 2025-12-04T09:59:13.3241142Z ============================= test session starts ============================== 2025-12-04T09:59:13.3241468Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.3241576Z cachedir: .pytest_cache 2025-12-04T09:59:13.3242063Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.3242177Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.3242288Z configfile: pytest.ini 2025-12-04T09:59:13.3242821Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.3243033Z collecting ... collected 60 items / 7 deselected / 53 selected 2025-12-04T09:59:13.3243167Z stepcurrent: skipping 7 already run items. 2025-12-04T09:59:13.3243270Z Running 20 items in this shard 2025-12-04T09:59:13.3243275Z 2025-12-04T09:59:13.3244292Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda I1204 09:37:50.204000 50877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 50929 2025-12-04T09:59:13.3244760Z I1204 09:37:50.205000 50877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 50930 2025-12-04T09:59:13.3245238Z I1204 09:37:50.205000 50877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 50931 2025-12-04T09:59:13.3245730Z I1204 09:37:50.206000 50877 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 50932 2025-12-04T09:59:13.3247660Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3247764Z _warn_cpu_init() 2025-12-04T09:59:13.3249720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3249876Z _warn_cpu_init() 2025-12-04T09:59:13.3251661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3251761Z _warn_cpu_init() 2025-12-04T09:59:13.3252683Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.3252779Z _init_core_state( 2025-12-04T09:59:13.3253695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.3253786Z _init_core_state( 2025-12-04T09:59:13.3254710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.3254799Z _init_core_state( 2025-12-04T09:59:13.3256411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3256602Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3258482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3258653Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3260395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3260579Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3262592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3262701Z _warn_cpu_init() 2025-12-04T09:59:13.3263734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.3263866Z _init_core_state( 2025-12-04T09:59:13.3265583Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3265781Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3267488Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3267653Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3269550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3269700Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3271219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3271366Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3275441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3275801Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3279830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3280241Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3284234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3284595Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3288634Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3288982Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3289681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3289782Z return func(*args, **kwargs) 2025-12-04T09:59:13.3290487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3290598Z return func(*args, **kwargs) 2025-12-04T09:59:13.3291281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3291397Z return func(*args, **kwargs) 2025-12-04T09:59:13.3292072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3292170Z return func(*args, **kwargs) 2025-12-04T09:59:13.3292849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3292975Z return func(*args, **kwargs) 2025-12-04T09:59:13.3293647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3293753Z return func(*args, **kwargs) 2025-12-04T09:59:13.3294452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3294559Z return func(*args, **kwargs) 2025-12-04T09:59:13.3295236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3295329Z return func(*args, **kwargs) 2025-12-04T09:59:13.3296223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.3296385Z return func(*args, **kwargs) 2025-12-04T09:59:13.3296997Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3297536Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3298538Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3299060Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3300043Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3300454Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3301456Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3301956Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3302919Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3303403Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3304408Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3304856Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3305831Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3306325Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3308040Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 611254272 and is now 10421731328. 2025-12-04T09:59:13.3308442Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3309235Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3310269Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3310592Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3311244Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3311732Z [rank1]:E1204 09:37:59.978000 50930 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.3312145Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3312617Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3313507Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3313966Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3314873Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3315239Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3316100Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3316548Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3317402Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3317867Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3318729Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3319124Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3319992Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3320455Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3322411Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 716111872 and is now 10532880384. 2025-12-04T09:59:13.3322847Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3323520Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3324680Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3325046Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3325776Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3326329Z [rank0]:E1204 09:37:59.979000 50929 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.3326795Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3327328Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3328338Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3328900Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3329893Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3330303Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3331264Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3331766Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3332762Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3333255Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3334259Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3334659Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3335518Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3335998Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3337968Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T09:59:13.3338339Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3339012Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3340192Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3340557Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3341279Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3341823Z [rank2]:E1204 09:37:59.980000 50931 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.3342282Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3342813Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3343848Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3344363Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3345352Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3345757Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3346721Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3347247Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3348212Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3348806Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3349812Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3350240Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3351108Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3351569Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3353078Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T09:59:13.3353404Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3353998Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3355032Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3355358Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3355998Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3356479Z [rank3]:E1204 09:37:59.980000 50932 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.3356582Z dist init r=0, world=4 2025-12-04T09:59:13.3356669Z dist init r=2, world=4 2025-12-04T09:59:13.3356757Z dist init r=3, world=4 2025-12-04T09:59:13.3356855Z dist init r=1, world=4 2025-12-04T09:59:13.3357903Z [rank0]:[W1204 09:38:00.997136005 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3358927Z [rank1]:[W1204 09:38:00.998519129 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3359937Z [rank2]:[W1204 09:38:00.998744042 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3360975Z [rank3]:[W1204 09:38:00.999515853 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3361077Z FAILED [27.4146s] [ 5%] 2025-12-04T09:59:13.3361082Z 2025-12-04T09:59:13.3361216Z =================================== FAILURES =================================== 2025-12-04T09:59:13.3361516Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda _ 2025-12-04T09:59:13.3361624Z Traceback (most recent call last): 2025-12-04T09:59:13.3362113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.3362247Z self._join_processes(fn) 2025-12-04T09:59:13.3362768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.3362902Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.3363440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.3363571Z raise RuntimeError(error) 2025-12-04T09:59:13.3363791Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.3363898Z Traceback (most recent call last): 2025-12-04T09:59:13.3364375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3364484Z getattr(self, test_name)() 2025-12-04T09:59:13.3364960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3365053Z fn() 2025-12-04T09:59:13.3365503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3365597Z method(*args, **kwargs) 2025-12-04T09:59:13.3366056Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3366152Z method(*args, **kwargs) 2025-12-04T09:59:13.3366608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3366695Z with policy(): 2025-12-04T09:59:13.3367146Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3367254Z raise RuntimeError(msg) 2025-12-04T09:59:13.3368369Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T09:59:13.3368378Z 2025-12-04T09:59:13.3368581Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3369238Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3369244Z 2025-12-04T09:59:13.3369479Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3369484Z 2025-12-04T09:59:13.3369637Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.3369744Z Traceback (most recent call last): 2025-12-04T09:59:13.3370239Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3370339Z getattr(self, test_name)() 2025-12-04T09:59:13.3370814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3370905Z fn() 2025-12-04T09:59:13.3371380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3371477Z method(*args, **kwargs) 2025-12-04T09:59:13.3371934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3372028Z method(*args, **kwargs) 2025-12-04T09:59:13.3372483Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3372568Z with policy(): 2025-12-04T09:59:13.3373020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3373155Z raise RuntimeError(msg) 2025-12-04T09:59:13.3374272Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T09:59:13.3374303Z 2025-12-04T09:59:13.3374503Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3375133Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3375138Z 2025-12-04T09:59:13.3375375Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3375380Z 2025-12-04T09:59:13.3375393Z 2025-12-04T09:59:13.3375590Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.3375825Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.3376630Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bdae057bafb686b9.xml - 2025-12-04T09:59:13.3376975Z =========================== short test summary info ============================ 2025-12-04T09:59:13.3377851Z FAILED [27.4146s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.3377980Z Traceback (most recent call last): 2025-12-04T09:59:13.3378533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3378651Z getattr(self, test_name)() 2025-12-04T09:59:13.3379187Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3379280Z fn() 2025-12-04T09:59:13.3379795Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3379903Z method(*args, **kwargs) 2025-12-04T09:59:13.3380451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3380559Z method(*args, **kwargs) 2025-12-04T09:59:13.3381061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3381165Z with policy(): 2025-12-04T09:59:13.3381669Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3381778Z raise RuntimeError(msg) 2025-12-04T09:59:13.3383039Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T09:59:13.3383047Z 2025-12-04T09:59:13.3383291Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3384015Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3384022Z 2025-12-04T09:59:13.3384286Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3384291Z 2025-12-04T09:59:13.3384463Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.3384584Z Traceback (most recent call last): 2025-12-04T09:59:13.3385130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3385253Z getattr(self, test_name)() 2025-12-04T09:59:13.3385835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3385923Z fn() 2025-12-04T09:59:13.3386439Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3386571Z method(*args, **kwargs) 2025-12-04T09:59:13.3387086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3387187Z method(*args, **kwargs) 2025-12-04T09:59:13.3387689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3387797Z with policy(): 2025-12-04T09:59:13.3388304Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3388414Z raise RuntimeError(msg) 2025-12-04T09:59:13.3389804Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T09:59:13.3389812Z 2025-12-04T09:59:13.3390003Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3390644Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3390648Z 2025-12-04T09:59:13.3390881Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3391049Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.3391205Z ======================= 1 failed, 7 deselected in 27.63s ======================= 2025-12-04T09:59:13.3391294Z Got exit code 1 2025-12-04T09:59:13.3391398Z Retrying single test... 2025-12-04T09:59:13.3391951Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-eb4953947b5f3ef2.xml 2025-12-04T09:59:13.3392123Z ============================= test session starts ============================== 2025-12-04T09:59:13.3392445Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.3392540Z cachedir: .pytest_cache 2025-12-04T09:59:13.3393005Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.3393113Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.3393207Z configfile: pytest.ini 2025-12-04T09:59:13.3393687Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.3393882Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.3394610Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3394726Z Running 1 items in this shard 2025-12-04T09:59:13.3394733Z 2025-12-04T09:59:13.3395683Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda I1204 09:38:22.164000 51982 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 52034 2025-12-04T09:59:13.3396137Z I1204 09:38:22.165000 51982 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 52035 2025-12-04T09:59:13.3396576Z I1204 09:38:22.165000 51982 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 52036 2025-12-04T09:59:13.3397019Z I1204 09:38:22.166000 51982 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 52037 2025-12-04T09:59:13.3398856Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3398982Z _warn_cpu_init() 2025-12-04T09:59:13.3400757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3400848Z _warn_cpu_init() 2025-12-04T09:59:13.3402634Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3402724Z _warn_cpu_init() 2025-12-04T09:59:13.3403661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.3403746Z _init_core_state( 2025-12-04T09:59:13.3404679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.3404769Z _init_core_state( 2025-12-04T09:59:13.3405704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.3405806Z _init_core_state( 2025-12-04T09:59:13.3407325Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3407484Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3409029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3409184Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3410690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3410872Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3412666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3412783Z _warn_cpu_init() 2025-12-04T09:59:13.3413705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.3413794Z _init_core_state( 2025-12-04T09:59:13.3415312Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3415466Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3417279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3417443Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3419194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3419362Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3421297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3421475Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3426048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3426459Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3431013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3431441Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3435846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3436268Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3440657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3441018Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3441709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3441824Z return func(*args, **kwargs) 2025-12-04T09:59:13.3442498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3442593Z return func(*args, **kwargs) 2025-12-04T09:59:13.3443303Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3443402Z return func(*args, **kwargs) 2025-12-04T09:59:13.3444085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3444206Z return func(*args, **kwargs) 2025-12-04T09:59:13.3444875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3444977Z return func(*args, **kwargs) 2025-12-04T09:59:13.3445644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3445743Z return func(*args, **kwargs) 2025-12-04T09:59:13.3446419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3446516Z return func(*args, **kwargs) 2025-12-04T09:59:13.3447201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3447299Z return func(*args, **kwargs) 2025-12-04T09:59:13.3448184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.3448287Z return func(*args, **kwargs) 2025-12-04T09:59:13.3448698Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3449179Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3450099Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3450561Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3451440Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3451792Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3452678Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3453114Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3453970Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3454404Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3455251Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3455684Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3456609Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3457311Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3459023Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 714014720 and is now 10532880384. 2025-12-04T09:59:13.3459398Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3460060Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3461238Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3461602Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3462321Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3462870Z [rank0]:E1204 09:38:31.897000 52034 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.3463328Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3463897Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3464907Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3465415Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3466412Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3466809Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3467810Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3468303Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3469347Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3469782Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3470667Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3471072Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3471953Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3472397Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3473906Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 611254272 and is now 10421731328. 2025-12-04T09:59:13.3474240Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3474823Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3475855Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3476175Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3476815Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3477329Z [rank1]:E1204 09:38:31.899000 52035 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.3477732Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3478207Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3479094Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3479546Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3480452Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3480803Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3481660Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3482091Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3482947Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3483408Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3484259Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3484687Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3485541Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3485986Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3487497Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T09:59:13.3487828Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3488412Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3493840Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3494245Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3494987Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3495483Z [rank3]:E1204 09:38:31.900000 52037 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.3495887Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3496475Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3497630Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3498200Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3499192Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3499600Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3500565Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3501049Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3502055Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3502579Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3503542Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3503988Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3504953Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3505441Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3507154Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 607059968 and is now 10421731328. 2025-12-04T09:59:13.3507520Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3508179Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3509515Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3509870Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3510510Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3510987Z [rank2]:E1204 09:38:31.900000 52036 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.3511077Z dist init r=2, world=4 2025-12-04T09:59:13.3511169Z dist init r=0, world=4 2025-12-04T09:59:13.3511252Z dist init r=1, world=4 2025-12-04T09:59:13.3511340Z dist init r=3, world=4 2025-12-04T09:59:13.3512392Z [rank0]:[W1204 09:38:32.916397137 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3513403Z [rank2]:[W1204 09:38:32.918946814 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3514411Z [rank1]:[W1204 09:38:32.919209159 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3515412Z [rank3]:[W1204 09:38:32.928217137 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3515532Z FAILED [27.2325s] [100%] 2025-12-04T09:59:13.3515541Z 2025-12-04T09:59:13.3515671Z =================================== FAILURES =================================== 2025-12-04T09:59:13.3515989Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda _ 2025-12-04T09:59:13.3516092Z Traceback (most recent call last): 2025-12-04T09:59:13.3516576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.3516680Z self._join_processes(fn) 2025-12-04T09:59:13.3517195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.3517316Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.3517859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.3517957Z raise RuntimeError(error) 2025-12-04T09:59:13.3518170Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.3518277Z Traceback (most recent call last): 2025-12-04T09:59:13.3518756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3518857Z getattr(self, test_name)() 2025-12-04T09:59:13.3519324Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3519401Z fn() 2025-12-04T09:59:13.3519862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3519952Z method(*args, **kwargs) 2025-12-04T09:59:13.3520406Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3520496Z method(*args, **kwargs) 2025-12-04T09:59:13.3521302Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3521420Z with policy(): 2025-12-04T09:59:13.3521931Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3522045Z raise RuntimeError(msg) 2025-12-04T09:59:13.3523292Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 714014720 and is now 10532880384. 2025-12-04T09:59:13.3523299Z 2025-12-04T09:59:13.3523510Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3524236Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3524302Z 2025-12-04T09:59:13.3524570Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3524578Z 2025-12-04T09:59:13.3524583Z 2025-12-04T09:59:13.3524810Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.3525071Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.3525881Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-eb4953947b5f3ef2.xml - 2025-12-04T09:59:13.3526057Z =========================== short test summary info ============================ 2025-12-04T09:59:13.3526929Z FAILED [27.2325s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.3527103Z Traceback (most recent call last): 2025-12-04T09:59:13.3527655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3527804Z getattr(self, test_name)() 2025-12-04T09:59:13.3528344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3528433Z fn() 2025-12-04T09:59:13.3528946Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3529050Z method(*args, **kwargs) 2025-12-04T09:59:13.3529552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3529660Z method(*args, **kwargs) 2025-12-04T09:59:13.3530162Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3530256Z with policy(): 2025-12-04T09:59:13.3530772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3530883Z raise RuntimeError(msg) 2025-12-04T09:59:13.3532144Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 714014720 and is now 10532880384. 2025-12-04T09:59:13.3532150Z 2025-12-04T09:59:13.3532364Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3533076Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3533084Z 2025-12-04T09:59:13.3533476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3533762Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.3534087Z ====================== 1 failed, 26 deselected in 27.45s ======================= 2025-12-04T09:59:13.3534177Z Got exit code 1 2025-12-04T09:59:13.3534270Z Retrying single test... 2025-12-04T09:59:13.3534825Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-532f83d54e2054ff.xml 2025-12-04T09:59:13.3534967Z ============================= test session starts ============================== 2025-12-04T09:59:13.3535284Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.3535376Z cachedir: .pytest_cache 2025-12-04T09:59:13.3535830Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.3535949Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.3536040Z configfile: pytest.ini 2025-12-04T09:59:13.3536624Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.3537022Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.3537816Z stepcurrent: skipping 7 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3537934Z Running 1 items in this shard 2025-12-04T09:59:13.3537940Z 2025-12-04T09:59:13.3538998Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda I1204 09:38:54.044000 53087 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 53139 2025-12-04T09:59:13.3539538Z I1204 09:38:54.045000 53087 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 53140 2025-12-04T09:59:13.3540032Z I1204 09:38:54.046000 53087 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 53141 2025-12-04T09:59:13.3540521Z I1204 09:38:54.046000 53087 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 53142 2025-12-04T09:59:13.3542598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3542693Z _warn_cpu_init() 2025-12-04T09:59:13.3544728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3544828Z _warn_cpu_init() 2025-12-04T09:59:13.3546840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3546940Z _warn_cpu_init() 2025-12-04T09:59:13.3548016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.3548117Z _init_core_state( 2025-12-04T09:59:13.3549330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.3549425Z _init_core_state( 2025-12-04T09:59:13.3550344Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.3550439Z _init_core_state( 2025-12-04T09:59:13.3551991Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3552144Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3553671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3553812Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3555401Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3555572Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3557373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3557462Z _warn_cpu_init() 2025-12-04T09:59:13.3558397Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.3558482Z _init_core_state( 2025-12-04T09:59:13.3559987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3560139Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3561653Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3561809Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3563340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3563493Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3565022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3565175Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3569537Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3569954Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3574378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3574770Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3579527Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3579921Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3584480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.3584874Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.3585648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3585787Z return func(*args, **kwargs) 2025-12-04T09:59:13.3586563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3586702Z return func(*args, **kwargs) 2025-12-04T09:59:13.3587466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3587580Z return func(*args, **kwargs) 2025-12-04T09:59:13.3588447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3588563Z return func(*args, **kwargs) 2025-12-04T09:59:13.3589296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3589399Z return func(*args, **kwargs) 2025-12-04T09:59:13.3590252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3590355Z return func(*args, **kwargs) 2025-12-04T09:59:13.3591070Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3591170Z return func(*args, **kwargs) 2025-12-04T09:59:13.3591877Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3591985Z return func(*args, **kwargs) 2025-12-04T09:59:13.3592922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.3593030Z return func(*args, **kwargs) 2025-12-04T09:59:13.3593493Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3593994Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3594943Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3595419Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3596393Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3596768Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3597676Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3598300Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3599239Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3599763Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3600696Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3601162Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3602092Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3602572Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3604320Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 716111872 and is now 10532880384. 2025-12-04T09:59:13.3604664Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3605286Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3606376Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3606727Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3607436Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3607959Z [rank0]:E1204 09:39:03.788000 53139 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.3608384Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3608878Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3609825Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3610403Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3611317Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3611672Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3612532Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3612963Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3613843Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3614282Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3615160Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3615559Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3616501Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3617145Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3618862Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 1. CUDA driver allocated memory was 611254272 and is now 10421731328. 2025-12-04T09:59:13.3619226Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3619885Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3621281Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3621729Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3622445Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3622988Z [rank1]:E1204 09:39:03.789000 53140 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.3623442Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3623971Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3625015Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3625524Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3626519Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3626913Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3627871Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3628401Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3629363Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3629897Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3630856Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3631304Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3632272Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3632973Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3634497Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T09:59:13.3634816Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3635407Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3636461Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3636793Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3637426Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3637907Z [rank2]:E1204 09:39:03.790000 53141 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.3638312Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3638806Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3639704Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3640150Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3641035Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3641386Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3642263Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3642704Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3643587Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3644024Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3644868Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3645274Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3646128Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3646560Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3648066Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T09:59:13.3648390Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3649011Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3650037Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3650368Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3650996Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3651480Z [rank3]:E1204 09:39:03.791000 53142 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.3651613Z dist init r=1, world=4 2025-12-04T09:59:13.3651702Z dist init r=0, world=4 2025-12-04T09:59:13.3651799Z dist init r=2, world=4 2025-12-04T09:59:13.3651884Z dist init r=3, world=4 2025-12-04T09:59:13.3652909Z [rank1]:[W1204 09:39:04.811372721 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3653926Z [rank0]:[W1204 09:39:04.815044768 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3654962Z [rank2]:[W1204 09:39:04.817517465 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3655979Z [rank3]:[W1204 09:39:04.821384071 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3656097Z FAILED [27.6191s] [100%] 2025-12-04T09:59:13.3656102Z 2025-12-04T09:59:13.3656242Z =================================== FAILURES =================================== 2025-12-04T09:59:13.3656605Z _ TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda _ 2025-12-04T09:59:13.3656886Z Traceback (most recent call last): 2025-12-04T09:59:13.3657443Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.3657556Z self._join_processes(fn) 2025-12-04T09:59:13.3658150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.3658299Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.3658909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.3659029Z raise RuntimeError(error) 2025-12-04T09:59:13.3659262Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.3659378Z Traceback (most recent call last): 2025-12-04T09:59:13.3659923Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3660033Z getattr(self, test_name)() 2025-12-04T09:59:13.3660563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3660658Z fn() 2025-12-04T09:59:13.3661165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3661312Z method(*args, **kwargs) 2025-12-04T09:59:13.3661818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3661916Z method(*args, **kwargs) 2025-12-04T09:59:13.3662425Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3662519Z with policy(): 2025-12-04T09:59:13.3663024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3663143Z raise RuntimeError(msg) 2025-12-04T09:59:13.3664416Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 716111872 and is now 10532880384. 2025-12-04T09:59:13.3664424Z 2025-12-04T09:59:13.3664651Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3665364Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3665370Z 2025-12-04T09:59:13.3665645Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3665650Z 2025-12-04T09:59:13.3665811Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.3665931Z Traceback (most recent call last): 2025-12-04T09:59:13.3666480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3666616Z getattr(self, test_name)() 2025-12-04T09:59:13.3667153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3667250Z fn() 2025-12-04T09:59:13.3667758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3667896Z method(*args, **kwargs) 2025-12-04T09:59:13.3668399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3668503Z method(*args, **kwargs) 2025-12-04T09:59:13.3669107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3669194Z with policy(): 2025-12-04T09:59:13.3669648Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3669748Z raise RuntimeError(msg) 2025-12-04T09:59:13.3670857Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T09:59:13.3670864Z 2025-12-04T09:59:13.3671056Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3671686Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3671691Z 2025-12-04T09:59:13.3671932Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3671936Z 2025-12-04T09:59:13.3672079Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.3672190Z Traceback (most recent call last): 2025-12-04T09:59:13.3672681Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3672778Z getattr(self, test_name)() 2025-12-04T09:59:13.3673288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3673366Z fn() 2025-12-04T09:59:13.3673812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3673908Z method(*args, **kwargs) 2025-12-04T09:59:13.3674353Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3674442Z method(*args, **kwargs) 2025-12-04T09:59:13.3674890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3674973Z with policy(): 2025-12-04T09:59:13.3675426Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3675546Z raise RuntimeError(msg) 2025-12-04T09:59:13.3676647Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T09:59:13.3676654Z 2025-12-04T09:59:13.3676848Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3677474Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3677479Z 2025-12-04T09:59:13.3677718Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3677754Z 2025-12-04T09:59:13.3677758Z 2025-12-04T09:59:13.3677950Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.3678192Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.3678896Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-532f83d54e2054ff.xml - 2025-12-04T09:59:13.3679070Z =========================== short test summary info ============================ 2025-12-04T09:59:13.3679850Z FAILED [27.6191s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.3679955Z Traceback (most recent call last): 2025-12-04T09:59:13.3680445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3680543Z getattr(self, test_name)() 2025-12-04T09:59:13.3681013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3681098Z fn() 2025-12-04T09:59:13.3681543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3681637Z method(*args, **kwargs) 2025-12-04T09:59:13.3682091Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3682178Z method(*args, **kwargs) 2025-12-04T09:59:13.3682808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3682900Z with policy(): 2025-12-04T09:59:13.3683374Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3683486Z raise RuntimeError(msg) 2025-12-04T09:59:13.3684686Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 0. CUDA driver allocated memory was 716111872 and is now 10532880384. 2025-12-04T09:59:13.3684694Z 2025-12-04T09:59:13.3684902Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3685568Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3685573Z 2025-12-04T09:59:13.3685820Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3685825Z 2025-12-04T09:59:13.3685981Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.3686095Z Traceback (most recent call last): 2025-12-04T09:59:13.3686617Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3686746Z getattr(self, test_name)() 2025-12-04T09:59:13.3687253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3687347Z fn() 2025-12-04T09:59:13.3687823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3687919Z method(*args, **kwargs) 2025-12-04T09:59:13.3688398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3688498Z method(*args, **kwargs) 2025-12-04T09:59:13.3688976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3689095Z with policy(): 2025-12-04T09:59:13.3689567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3689678Z raise RuntimeError(msg) 2025-12-04T09:59:13.3690848Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 2. CUDA driver allocated memory was 604962816 and is now 10421731328. 2025-12-04T09:59:13.3690880Z 2025-12-04T09:59:13.3691096Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3691760Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3691765Z 2025-12-04T09:59:13.3692011Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3692026Z 2025-12-04T09:59:13.3692176Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.3692287Z Traceback (most recent call last): 2025-12-04T09:59:13.3692808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3692917Z getattr(self, test_name)() 2025-12-04T09:59:13.3693418Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3693507Z fn() 2025-12-04T09:59:13.3693987Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3694084Z method(*args, **kwargs) 2025-12-04T09:59:13.3694558Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3694653Z method(*args, **kwargs) 2025-12-04T09:59:13.3695126Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3695219Z with policy(): 2025-12-04T09:59:13.3695737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3695838Z raise RuntimeError(msg) 2025-12-04T09:59:13.3697298Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 117248 on device 3. CUDA driver allocated memory was 609157120 and is now 10421731328. 2025-12-04T09:59:13.3697305Z 2025-12-04T09:59:13.3697521Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3698233Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3698241Z 2025-12-04T09:59:13.3698503Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3698720Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.3698900Z ====================== 1 failed, 26 deselected in 27.84s ======================= 2025-12-04T09:59:13.3698999Z Got exit code 1 2025-12-04T09:59:13.3699637Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.3700040Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.3700657Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3483d762b5b4fca1.xml 2025-12-04T09:59:13.3700825Z ============================= test session starts ============================== 2025-12-04T09:59:13.3701203Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.3701318Z cachedir: .pytest_cache 2025-12-04T09:59:13.3701836Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.3701989Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.3702096Z configfile: pytest.ini 2025-12-04T09:59:13.3702627Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.3702842Z collecting ... collected 60 items / 8 deselected / 52 selected 2025-12-04T09:59:13.3702980Z stepcurrent: skipping 8 already run items. 2025-12-04T09:59:13.3703089Z Running 19 items in this shard 2025-12-04T09:59:13.3703094Z 2025-12-04T09:59:13.3704241Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda I1204 09:39:25.944000 54192 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 54244 2025-12-04T09:59:13.3704738Z I1204 09:39:25.945000 54192 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 54245 2025-12-04T09:59:13.3705235Z I1204 09:39:25.946000 54192 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 54246 2025-12-04T09:59:13.3705724Z I1204 09:39:25.946000 54192 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 54247 2025-12-04T09:59:13.3707748Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3707853Z _warn_cpu_init() 2025-12-04T09:59:13.3709860Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3709959Z _warn_cpu_init() 2025-12-04T09:59:13.3711738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3711834Z _warn_cpu_init() 2025-12-04T09:59:13.3713363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3713521Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3715047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3715224Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3716732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3716899Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3718694Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3718781Z _warn_cpu_init() 2025-12-04T09:59:13.3720302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3720443Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3721692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3721940Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T09:59:13.3722998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3723240Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T09:59:13.3724220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3724459Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T09:59:13.3726177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3726385Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3728089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3728255Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3729244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3729499Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.3730501Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3730753Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.3731751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3731962Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.3732955Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.3733072Z return func(*args, **kwargs) 2025-12-04T09:59:13.3734192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3734418Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T09:59:13.3735942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3736090Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3737264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3737488Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.3738305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3738417Z return func(*args, **kwargs) 2025-12-04T09:59:13.3739193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3739299Z return func(*args, **kwargs) 2025-12-04T09:59:13.3740058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3740172Z return func(*args, **kwargs) 2025-12-04T09:59:13.3740954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3741073Z return func(*args, **kwargs) 2025-12-04T09:59:13.3741827Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3741930Z return func(*args, **kwargs) 2025-12-04T09:59:13.3742685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3742789Z return func(*args, **kwargs) 2025-12-04T09:59:13.3743545Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3743679Z return func(*args, **kwargs) 2025-12-04T09:59:13.3744434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3744590Z return func(*args, **kwargs) 2025-12-04T09:59:13.3745046Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3745584Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3746584Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3747094Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3748095Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3748492Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3749517Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3749950Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3750807Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3751273Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3752128Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3752533Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3753385Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3753823Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3755425Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 720306176 and is now 10516103168. 2025-12-04T09:59:13.3755757Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3756343Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3757443Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3757798Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3758431Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3758949Z [rank0]:E1204 09:39:52.294000 54244 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.3759348Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3759827Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3760712Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3761168Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3762058Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3762407Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3763263Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3763696Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3764576Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3765008Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3765854Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3766255Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3767185Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3767631Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3769213Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T09:59:13.3769539Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3770123Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3771251Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3771608Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3772240Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3772731Z [rank1]:E1204 09:39:52.294000 54245 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.3773128Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3773606Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3774499Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3774948Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3775836Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3776187Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3777342Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3777867Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3778835Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3779313Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3780271Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3780750Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3781715Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3782212Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3783984Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T09:59:13.3784409Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3785065Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3786332Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3786697Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3787408Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3787955Z [rank2]:E1204 09:39:52.295000 54246 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.3788403Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3789044Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3790043Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3790489Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3791367Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3791719Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3792601Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3793034Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3793894Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3794322Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3795195Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3795596Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3796447Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3796884Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3798457Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.3798845Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3799426Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3800520Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3800843Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3801477Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3801965Z [rank3]:E1204 09:39:52.296000 54247 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.3802055Z dist init r=2, world=4 2025-12-04T09:59:13.3802139Z dist init r=1, world=4 2025-12-04T09:59:13.3802227Z dist init r=3, world=4 2025-12-04T09:59:13.3802310Z dist init r=0, world=4 2025-12-04T09:59:13.3803344Z [rank2]:[W1204 09:39:52.322601367 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3804353Z [rank1]:[W1204 09:39:52.323888686 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3805392Z [rank3]:[W1204 09:39:52.326083814 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3806403Z [rank0]:[W1204 09:39:52.330574149 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3806492Z FAILED [47.5521s] [ 5%] 2025-12-04T09:59:13.3806497Z 2025-12-04T09:59:13.3806637Z =================================== FAILURES =================================== 2025-12-04T09:59:13.3807003Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda _ 2025-12-04T09:59:13.3807116Z Traceback (most recent call last): 2025-12-04T09:59:13.3807628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.3807731Z self._join_processes(fn) 2025-12-04T09:59:13.3808255Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.3808379Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.3808921Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.3809023Z raise RuntimeError(error) 2025-12-04T09:59:13.3809231Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.3809369Z Traceback (most recent call last): 2025-12-04T09:59:13.3809847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3809945Z getattr(self, test_name)() 2025-12-04T09:59:13.3810435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3810540Z fn() 2025-12-04T09:59:13.3810998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3811091Z method(*args, **kwargs) 2025-12-04T09:59:13.3811538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3811639Z method(*args, **kwargs) 2025-12-04T09:59:13.3812083Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3812166Z with policy(): 2025-12-04T09:59:13.3812818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3812922Z raise RuntimeError(msg) 2025-12-04T09:59:13.3814175Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T09:59:13.3814183Z 2025-12-04T09:59:13.3814381Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3815118Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3815130Z 2025-12-04T09:59:13.3815376Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3815383Z 2025-12-04T09:59:13.3815388Z 2025-12-04T09:59:13.3815590Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.3815846Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.3816705Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3483d762b5b4fca1.xml - 2025-12-04T09:59:13.3817062Z =========================== short test summary info ============================ 2025-12-04T09:59:13.3818006Z FAILED [47.5521s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.3818124Z Traceback (most recent call last): 2025-12-04T09:59:13.3818678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3818794Z getattr(self, test_name)() 2025-12-04T09:59:13.3819368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3819456Z fn() 2025-12-04T09:59:13.3819965Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3820072Z method(*args, **kwargs) 2025-12-04T09:59:13.3820573Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3820674Z method(*args, **kwargs) 2025-12-04T09:59:13.3821388Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3821488Z with policy(): 2025-12-04T09:59:13.3822002Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3822175Z raise RuntimeError(msg) 2025-12-04T09:59:13.3823500Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T09:59:13.3823547Z 2025-12-04T09:59:13.3823765Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3824549Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3824554Z 2025-12-04T09:59:13.3824821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3824996Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.3825169Z ======================= 1 failed, 8 deselected in 47.77s ======================= 2025-12-04T09:59:13.3825270Z Got exit code 1 2025-12-04T09:59:13.3825376Z Retrying single test... 2025-12-04T09:59:13.3826009Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c6b2032ef8ff1e94.xml 2025-12-04T09:59:13.3826169Z ============================= test session starts ============================== 2025-12-04T09:59:13.3826512Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.3826623Z cachedir: .pytest_cache 2025-12-04T09:59:13.3827136Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.3827257Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.3827368Z configfile: pytest.ini 2025-12-04T09:59:13.3827902Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.3828121Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.3829022Z stepcurrent: skipping 8 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3829135Z Running 1 items in this shard 2025-12-04T09:59:13.3829141Z 2025-12-04T09:59:13.3830285Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda I1204 09:40:18.514000 55441 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 55493 2025-12-04T09:59:13.3830778Z I1204 09:40:18.514000 55441 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 55494 2025-12-04T09:59:13.3831275Z I1204 09:40:18.515000 55441 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 55495 2025-12-04T09:59:13.3831797Z I1204 09:40:18.516000 55441 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 55496 2025-12-04T09:59:13.3834002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3834097Z _warn_cpu_init() 2025-12-04T09:59:13.3836008Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3836132Z _warn_cpu_init() 2025-12-04T09:59:13.3837777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3837939Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3839541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3839702Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3841599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3841695Z _warn_cpu_init() 2025-12-04T09:59:13.3844166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3844279Z _warn_cpu_init() 2025-12-04T09:59:13.3845943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3846106Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3847801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3847966Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3848935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3849169Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T09:59:13.3850138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3850394Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T09:59:13.3852060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3852245Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3853207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3853429Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.3854383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3854603Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.3855572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.3855683Z return func(*args, **kwargs) 2025-12-04T09:59:13.3856734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3856970Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T09:59:13.3858906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3859081Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3860089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3860305Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.3861293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3861545Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T09:59:13.3863278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3863452Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3864437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3864660Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.3865454Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3865564Z return func(*args, **kwargs) 2025-12-04T09:59:13.3866340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3866474Z return func(*args, **kwargs) 2025-12-04T09:59:13.3867245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3870360Z return func(*args, **kwargs) 2025-12-04T09:59:13.3871091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.3871205Z return func(*args, **kwargs) 2025-12-04T09:59:13.3871924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3872025Z return func(*args, **kwargs) 2025-12-04T09:59:13.3872741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3872846Z return func(*args, **kwargs) 2025-12-04T09:59:13.3873553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3873684Z return func(*args, **kwargs) 2025-12-04T09:59:13.3874390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.3874500Z return func(*args, **kwargs) 2025-12-04T09:59:13.3874934Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3875487Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3876436Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3876913Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3877853Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3878226Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3879159Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3879624Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3880522Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3880984Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3882023Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3882430Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3883277Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3883707Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3885347Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 720306176 and is now 10516103168. 2025-12-04T09:59:13.3885674Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3886262Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3887355Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3887684Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3888317Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3888832Z [rank0]:E1204 09:40:41.538000 55493 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.3889234Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3889702Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3890602Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3891054Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3891958Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3892311Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3893159Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3893594Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3894441Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3894906Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3895756Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3896161Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3897336Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3897895Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3899684Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T09:59:13.3900043Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3900703Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3901938Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3902311Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3903057Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3903608Z [rank2]:E1204 09:40:41.538000 55495 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.3904056Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3904589Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3905596Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3906134Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3907124Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3907518Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3908482Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3909203Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3910057Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3910489Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3911332Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3911771Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3912621Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3913059Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3914635Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T09:59:13.3914957Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3915547Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3916677Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3917004Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3917632Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3918119Z [rank1]:E1204 09:40:41.539000 55494 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.3918517Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.3918985Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.3919898Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3920349Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.3921585Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3921991Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.3923022Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3923513Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3924469Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3924958Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.3925967Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3926421Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.3927385Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3927875Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.3929647Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.3930011Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3930718Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3931956Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3932325Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.3933039Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3933698Z [rank3]:E1204 09:40:41.539000 55496 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.3933796Z dist init r=0, world=4 2025-12-04T09:59:13.3933923Z dist init r=1, world=4 2025-12-04T09:59:13.3934022Z dist init r=3, world=4 2025-12-04T09:59:13.3934115Z dist init r=2, world=4 2025-12-04T09:59:13.3935201Z [rank0]:[W1204 09:40:41.558527906 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3936363Z [rank1]:[W1204 09:40:41.561148993 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3937663Z [rank3]:[W1204 09:40:41.562651782 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3938855Z [rank2]:[W1204 09:40:41.563403561 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.3938956Z FAILED [41.5665s] [100%] 2025-12-04T09:59:13.3938962Z 2025-12-04T09:59:13.3939115Z =================================== FAILURES =================================== 2025-12-04T09:59:13.3939519Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda _ 2025-12-04T09:59:13.3939676Z Traceback (most recent call last): 2025-12-04T09:59:13.3940228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.3940338Z self._join_processes(fn) 2025-12-04T09:59:13.3940933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.3941071Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.3941671Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.3941790Z raise RuntimeError(error) 2025-12-04T09:59:13.3942021Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.3942140Z Traceback (most recent call last): 2025-12-04T09:59:13.3942683Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3942791Z getattr(self, test_name)() 2025-12-04T09:59:13.3943323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3943414Z fn() 2025-12-04T09:59:13.3943924Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3944062Z method(*args, **kwargs) 2025-12-04T09:59:13.3944566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3944665Z method(*args, **kwargs) 2025-12-04T09:59:13.3945181Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3945277Z with policy(): 2025-12-04T09:59:13.3945792Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3945898Z raise RuntimeError(msg) 2025-12-04T09:59:13.3947241Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T09:59:13.3947258Z 2025-12-04T09:59:13.3947475Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3948262Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3948268Z 2025-12-04T09:59:13.3948541Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3948550Z 2025-12-04T09:59:13.3948809Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.3948923Z Traceback (most recent call last): 2025-12-04T09:59:13.3949404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3949545Z getattr(self, test_name)() 2025-12-04T09:59:13.3950025Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3950105Z fn() 2025-12-04T09:59:13.3950552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3950647Z method(*args, **kwargs) 2025-12-04T09:59:13.3951093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3951189Z method(*args, **kwargs) 2025-12-04T09:59:13.3951662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3951747Z with policy(): 2025-12-04T09:59:13.3952199Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3952294Z raise RuntimeError(msg) 2025-12-04T09:59:13.3953466Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.3953480Z 2025-12-04T09:59:13.3953668Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3954365Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3954372Z 2025-12-04T09:59:13.3954613Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3954618Z 2025-12-04T09:59:13.3954622Z 2025-12-04T09:59:13.3954819Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.3955059Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.3955801Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c6b2032ef8ff1e94.xml - 2025-12-04T09:59:13.3955952Z =========================== short test summary info ============================ 2025-12-04T09:59:13.3956790Z FAILED [41.5665s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.3956896Z Traceback (most recent call last): 2025-12-04T09:59:13.3957391Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3957489Z getattr(self, test_name)() 2025-12-04T09:59:13.3957968Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3958052Z fn() 2025-12-04T09:59:13.3958524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3958628Z method(*args, **kwargs) 2025-12-04T09:59:13.3959073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3959164Z method(*args, **kwargs) 2025-12-04T09:59:13.3959611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3959698Z with policy(): 2025-12-04T09:59:13.3960150Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3960251Z raise RuntimeError(msg) 2025-12-04T09:59:13.3961452Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T09:59:13.3961458Z 2025-12-04T09:59:13.3961656Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3962357Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3962362Z 2025-12-04T09:59:13.3962598Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3962631Z 2025-12-04T09:59:13.3962772Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.3962881Z Traceback (most recent call last): 2025-12-04T09:59:13.3963376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.3963474Z getattr(self, test_name)() 2025-12-04T09:59:13.3963952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.3964036Z fn() 2025-12-04T09:59:13.3964481Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3964579Z method(*args, **kwargs) 2025-12-04T09:59:13.3965024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.3965115Z method(*args, **kwargs) 2025-12-04T09:59:13.3965568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.3965653Z with policy(): 2025-12-04T09:59:13.3966107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.3966202Z raise RuntimeError(msg) 2025-12-04T09:59:13.3967395Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.3967401Z 2025-12-04T09:59:13.3967598Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.3968292Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3968299Z 2025-12-04T09:59:13.3968540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.3968701Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.3968858Z ====================== 1 failed, 26 deselected in 41.78s ======================= 2025-12-04T09:59:13.3968971Z Got exit code 1 2025-12-04T09:59:13.3969062Z Retrying single test... 2025-12-04T09:59:13.3969614Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5647de3303d26f02.xml 2025-12-04T09:59:13.3969764Z ============================= test session starts ============================== 2025-12-04T09:59:13.3970072Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.3970174Z cachedir: .pytest_cache 2025-12-04T09:59:13.3970629Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.3970740Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.3970839Z configfile: pytest.ini 2025-12-04T09:59:13.3971338Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.3971536Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.3972304Z stepcurrent: skipping 8 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.3972400Z Running 1 items in this shard 2025-12-04T09:59:13.3972405Z 2025-12-04T09:59:13.3973604Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda I1204 09:41:04.533000 56690 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 56742 2025-12-04T09:59:13.3974222Z I1204 09:41:04.534000 56690 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 56743 2025-12-04T09:59:13.3974692Z I1204 09:41:04.535000 56690 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 56744 2025-12-04T09:59:13.3975155Z I1204 09:41:04.536000 56690 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 56745 2025-12-04T09:59:13.3977356Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3977460Z _warn_cpu_init() 2025-12-04T09:59:13.3979498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3979607Z _warn_cpu_init() 2025-12-04T09:59:13.3981617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3981724Z _warn_cpu_init() 2025-12-04T09:59:13.3983465Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3983639Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3985340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3985513Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3987227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3987424Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3989585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.3989703Z _warn_cpu_init() 2025-12-04T09:59:13.3991237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3991382Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3992264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3992473Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T09:59:13.3993357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3993566Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T09:59:13.3994461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.3994679Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T09:59:13.3996207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3996364Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3997911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3998064Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.3999570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.3999720Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4000625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.4000820Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.4001707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.4001897Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.4002778Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.4002995Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.4003868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:787: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.4004090Z shared = FSDP(shared, group, **fsdp_kwargs) # type: ignore[assignment] 2025-12-04T09:59:13.4004963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.4005159Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.4006035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.4006140Z return func(*args, **kwargs) 2025-12-04T09:59:13.4006830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4006928Z return func(*args, **kwargs) 2025-12-04T09:59:13.4007652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4007747Z return func(*args, **kwargs) 2025-12-04T09:59:13.4008418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4008519Z return func(*args, **kwargs) 2025-12-04T09:59:13.4009194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4009301Z return func(*args, **kwargs) 2025-12-04T09:59:13.4010004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4010099Z return func(*args, **kwargs) 2025-12-04T09:59:13.4010773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4010869Z return func(*args, **kwargs) 2025-12-04T09:59:13.4011540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4011636Z return func(*args, **kwargs) 2025-12-04T09:59:13.4012311Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4012442Z return func(*args, **kwargs) 2025-12-04T09:59:13.4012849Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4013336Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4014223Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4014669Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4015582Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4015936Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4017062Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4017554Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4018517Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4019007Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4019972Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4020461Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4021628Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4022131Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4023979Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 0. CUDA driver allocated memory was 714014720 and is now 10516103168. 2025-12-04T09:59:13.4024353Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4025010Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4026264Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.4026627Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4027383Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4027937Z [rank0]:E1204 09:41:31.772000 56742 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.4028388Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4028923Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4029930Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4030483Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4031489Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4031886Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4032958Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4033393Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4034249Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4034676Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4035556Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4035959Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4036811Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4037251Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4038855Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T09:59:13.4039190Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4039769Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4040870Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.4041215Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4041853Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4042339Z [rank2]:E1204 09:41:31.774000 56744 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.4042740Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4043248Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4044134Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4044586Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4045470Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4045820Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4046676Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4047107Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4047998Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4048429Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4049271Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4049670Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4050525Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4050996Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4052569Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T09:59:13.4052897Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4053485Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4054625Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.4054946Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4055579Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4056096Z [rank1]:E1204 09:41:31.774000 56743 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.4056572Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4057274Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4058279Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4058784Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4059779Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4060179Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4061148Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4061667Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4062628Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4063113Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4064067Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4064523Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4065515Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4066014Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4067779Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 3. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T09:59:13.4068181Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4068840Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4070060Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.4070387Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4071052Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4071538Z [rank3]:E1204 09:41:31.775000 56745 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.4071627Z dist init r=1, world=4 2025-12-04T09:59:13.4071722Z dist init r=2, world=4 2025-12-04T09:59:13.4071805Z dist init r=0, world=4 2025-12-04T09:59:13.4071891Z dist init r=3, world=4 2025-12-04T09:59:13.4072916Z [rank2]:[W1204 09:41:32.789336263 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4073931Z [rank1]:[W1204 09:41:32.789805389 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4074955Z [rank0]:[W1204 09:41:32.791971395 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4075992Z [rank3]:[W1204 09:41:32.875642289 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4076089Z FAILED [49.6786s] [100%] 2025-12-04T09:59:13.4076094Z 2025-12-04T09:59:13.4076223Z =================================== FAILURES =================================== 2025-12-04T09:59:13.4076580Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda _ 2025-12-04T09:59:13.4076694Z Traceback (most recent call last): 2025-12-04T09:59:13.4077180Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.4077280Z self._join_processes(fn) 2025-12-04T09:59:13.4077837Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.4077962Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.4078503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.4078602Z raise RuntimeError(error) 2025-12-04T09:59:13.4078807Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.4078919Z Traceback (most recent call last): 2025-12-04T09:59:13.4079395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4079490Z getattr(self, test_name)() 2025-12-04T09:59:13.4080000Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4080076Z fn() 2025-12-04T09:59:13.4080531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4080622Z method(*args, **kwargs) 2025-12-04T09:59:13.4081065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4081162Z method(*args, **kwargs) 2025-12-04T09:59:13.4081606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4081725Z with policy(): 2025-12-04T09:59:13.4082170Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4082264Z raise RuntimeError(msg) 2025-12-04T09:59:13.4083444Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T09:59:13.4083452Z 2025-12-04T09:59:13.4083641Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4084344Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.4084349Z 2025-12-04T09:59:13.4084581Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4084588Z 2025-12-04T09:59:13.4084592Z 2025-12-04T09:59:13.4084784Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.4085018Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.4085724Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5647de3303d26f02.xml - 2025-12-04T09:59:13.4085907Z =========================== short test summary info ============================ 2025-12-04T09:59:13.4086742Z FAILED [49.6786s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.4086849Z Traceback (most recent call last): 2025-12-04T09:59:13.4087341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4087438Z getattr(self, test_name)() 2025-12-04T09:59:13.4087912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4087991Z fn() 2025-12-04T09:59:13.4088435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4088554Z method(*args, **kwargs) 2025-12-04T09:59:13.4089004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4089097Z method(*args, **kwargs) 2025-12-04T09:59:13.4089536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4089619Z with policy(): 2025-12-04T09:59:13.4090070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4090166Z raise RuntimeError(msg) 2025-12-04T09:59:13.4091333Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 166400 on device 2. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T09:59:13.4091372Z 2025-12-04T09:59:13.4091562Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4092261Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.4092266Z 2025-12-04T09:59:13.4092504Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4092662Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.4092879Z ====================== 1 failed, 26 deselected in 49.90s ======================= 2025-12-04T09:59:13.4092963Z Got exit code 1 2025-12-04T09:59:13.4093590Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda 2025-12-04T09:59:13.4093962Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.4094512Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cff7e7504b276d84.xml 2025-12-04T09:59:13.4094653Z ============================= test session starts ============================== 2025-12-04T09:59:13.4094970Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.4095067Z cachedir: .pytest_cache 2025-12-04T09:59:13.4095525Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.4095631Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.4095723Z configfile: pytest.ini 2025-12-04T09:59:13.4096200Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.4096470Z collecting ... collected 60 items / 9 deselected / 51 selected 2025-12-04T09:59:13.4096597Z stepcurrent: skipping 9 already run items. 2025-12-04T09:59:13.4096911Z Running 18 items in this shard 2025-12-04T09:59:13.4096918Z 2025-12-04T09:59:13.4098054Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda I1204 09:41:59.014000 57939 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 57991 2025-12-04T09:59:13.4098553Z I1204 09:41:59.015000 57939 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 57992 2025-12-04T09:59:13.4099049Z I1204 09:41:59.015000 57939 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 57993 2025-12-04T09:59:13.4099543Z I1204 09:41:59.016000 57939 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 57994 2025-12-04T09:59:13.4101628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4101735Z _warn_cpu_init() 2025-12-04T09:59:13.4103739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4103866Z _warn_cpu_init() 2025-12-04T09:59:13.4104897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.4104995Z _init_core_state( 2025-12-04T09:59:13.4106712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4106907Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4109039Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4109132Z _warn_cpu_init() 2025-12-04T09:59:13.4110029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.4110124Z _init_core_state( 2025-12-04T09:59:13.4111638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4111799Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4113615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4113707Z _warn_cpu_init() 2025-12-04T09:59:13.4114603Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.4114701Z _init_core_state( 2025-12-04T09:59:13.4116239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4116387Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4117292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.4117378Z _init_core_state( 2025-12-04T09:59:13.4118902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4119073Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4120594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4120913Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4122756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4122930Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4124628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4124800Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4125802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.4125927Z return func(*args, **kwargs) 2025-12-04T09:59:13.4126762Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4126871Z return func(*args, **kwargs) 2025-12-04T09:59:13.4127651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4127757Z return func(*args, **kwargs) 2025-12-04T09:59:13.4128543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4128653Z return func(*args, **kwargs) 2025-12-04T09:59:13.4129449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4129565Z return func(*args, **kwargs) 2025-12-04T09:59:13.4130319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4130435Z return func(*args, **kwargs) 2025-12-04T09:59:13.4131193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4131301Z return func(*args, **kwargs) 2025-12-04T09:59:13.4132068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4132211Z return func(*args, **kwargs) 2025-12-04T09:59:13.4132983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4133087Z return func(*args, **kwargs) 2025-12-04T09:59:13.4133668Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4134194Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4135164Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4135707Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4136930Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4137338Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4138306Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4138792Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4139758Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4140245Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4141244Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4141690Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4142660Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4143166Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4144964Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 714014720 and is now 10516103168. 2025-12-04T09:59:13.4145333Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4145989Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4147219Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4147622Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4148436Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4149080Z [rank0]:E1204 09:42:22.637000 57991 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.4149505Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4150044Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4150975Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4151465Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4152390Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4152761Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4153673Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4154132Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4155066Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4155522Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4156433Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4156847Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4157749Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4158245Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4159891Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4160343Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4160926Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4162269Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4162612Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4163282Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4163806Z [rank1]:E1204 09:42:22.639000 57992 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.4164258Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4164764Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4165705Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4166193Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4167118Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4167489Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4168409Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4168893Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4169800Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4170254Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4171167Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4171587Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4172523Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4172995Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4174676Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T09:59:13.4175041Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4175626Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4176982Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4177351Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4178112Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4178657Z [rank3]:E1204 09:42:22.639000 57994 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.4179115Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4179660Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4180663Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4181178Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4182166Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4182564Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4183562Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4184051Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4185014Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4185504Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4186499Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4186957Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4187923Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4188419Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4190253Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T09:59:13.4190998Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4191579Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4192671Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4193019Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4193667Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4194150Z [rank2]:E1204 09:42:22.640000 57993 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.4194240Z dist init r=2, world=4 2025-12-04T09:59:13.4194333Z dist init r=1, world=4 2025-12-04T09:59:13.4194419Z dist init r=0, world=4 2025-12-04T09:59:13.4194505Z dist init r=3, world=4 2025-12-04T09:59:13.4195534Z [rank2]:[W1204 09:42:23.660981110 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4196547Z [rank1]:[W1204 09:42:23.661337285 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4197596Z [rank0]:[W1204 09:42:23.661823435 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4198598Z [rank3]:[W1204 09:42:23.661941869 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4198700Z FAILED [42.4544s] [ 5%] 2025-12-04T09:59:13.4198705Z 2025-12-04T09:59:13.4198836Z =================================== FAILURES =================================== 2025-12-04T09:59:13.4199177Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda _ 2025-12-04T09:59:13.4199291Z Traceback (most recent call last): 2025-12-04T09:59:13.4199809Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.4199918Z self._join_processes(fn) 2025-12-04T09:59:13.4200437Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.4200562Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.4201108Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.4201210Z raise RuntimeError(error) 2025-12-04T09:59:13.4201417Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.4201532Z Traceback (most recent call last): 2025-12-04T09:59:13.4202011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4202142Z getattr(self, test_name)() 2025-12-04T09:59:13.4202618Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4202696Z fn() 2025-12-04T09:59:13.4203151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4203244Z method(*args, **kwargs) 2025-12-04T09:59:13.4203690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4203791Z method(*args, **kwargs) 2025-12-04T09:59:13.4204264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4204359Z with policy(): 2025-12-04T09:59:13.4204808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4204905Z raise RuntimeError(msg) 2025-12-04T09:59:13.4206080Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 714014720 and is now 10516103168. 2025-12-04T09:59:13.4206085Z 2025-12-04T09:59:13.4206274Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4206960Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4206966Z 2025-12-04T09:59:13.4207200Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4207205Z 2025-12-04T09:59:13.4207364Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.4207469Z Traceback (most recent call last): 2025-12-04T09:59:13.4207958Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4208063Z getattr(self, test_name)() 2025-12-04T09:59:13.4208564Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4208640Z fn() 2025-12-04T09:59:13.4209093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4209185Z method(*args, **kwargs) 2025-12-04T09:59:13.4209640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4209729Z method(*args, **kwargs) 2025-12-04T09:59:13.4210173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4210268Z with policy(): 2025-12-04T09:59:13.4210743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4210840Z raise RuntimeError(msg) 2025-12-04T09:59:13.4211998Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4212003Z 2025-12-04T09:59:13.4212194Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4212881Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4212913Z 2025-12-04T09:59:13.4213147Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4213151Z 2025-12-04T09:59:13.4213155Z 2025-12-04T09:59:13.4213358Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.4213592Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.4214304Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cff7e7504b276d84.xml - 2025-12-04T09:59:13.4214461Z =========================== short test summary info ============================ 2025-12-04T09:59:13.4215289Z FAILED [42.4544s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.4215429Z Traceback (most recent call last): 2025-12-04T09:59:13.4215918Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4216016Z getattr(self, test_name)() 2025-12-04T09:59:13.4216585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4216837Z fn() 2025-12-04T09:59:13.4217358Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4217464Z method(*args, **kwargs) 2025-12-04T09:59:13.4217969Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4218083Z method(*args, **kwargs) 2025-12-04T09:59:13.4218588Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4218680Z with policy(): 2025-12-04T09:59:13.4219200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4219307Z raise RuntimeError(msg) 2025-12-04T09:59:13.4220652Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 714014720 and is now 10516103168. 2025-12-04T09:59:13.4220659Z 2025-12-04T09:59:13.4221106Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4221880Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4221897Z 2025-12-04T09:59:13.4222163Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4222171Z 2025-12-04T09:59:13.4222331Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.4222459Z Traceback (most recent call last): 2025-12-04T09:59:13.4223076Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4223191Z getattr(self, test_name)() 2025-12-04T09:59:13.4223734Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4223822Z fn() 2025-12-04T09:59:13.4224336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4224443Z method(*args, **kwargs) 2025-12-04T09:59:13.4224950Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4225065Z method(*args, **kwargs) 2025-12-04T09:59:13.4225608Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4225705Z with policy(): 2025-12-04T09:59:13.4226225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4226334Z raise RuntimeError(msg) 2025-12-04T09:59:13.4227646Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4227651Z 2025-12-04T09:59:13.4227901Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4228680Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4228688Z 2025-12-04T09:59:13.4228947Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4229126Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.4229310Z ======================= 1 failed, 9 deselected in 42.67s ======================= 2025-12-04T09:59:13.4229402Z Got exit code 1 2025-12-04T09:59:13.4229505Z Retrying single test... 2025-12-04T09:59:13.4230140Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d2fb83ab3ccdeb6.xml 2025-12-04T09:59:13.4230301Z ============================= test session starts ============================== 2025-12-04T09:59:13.4230657Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.4230763Z cachedir: .pytest_cache 2025-12-04T09:59:13.4231276Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.4231409Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.4231513Z configfile: pytest.ini 2025-12-04T09:59:13.4232087Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.4232308Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.4233387Z stepcurrent: skipping 9 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4233495Z Running 1 items in this shard 2025-12-04T09:59:13.4233502Z 2025-12-04T09:59:13.4234500Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda I1204 09:42:46.014000 59188 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 59240 2025-12-04T09:59:13.4234950Z I1204 09:42:46.015000 59188 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 59241 2025-12-04T09:59:13.4235415Z I1204 09:42:46.016000 59188 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 59242 2025-12-04T09:59:13.4235849Z I1204 09:42:46.017000 59188 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 59243 2025-12-04T09:59:13.4237654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4237767Z _warn_cpu_init() 2025-12-04T09:59:13.4239560Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4239645Z _warn_cpu_init() 2025-12-04T09:59:13.4241424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4241540Z _warn_cpu_init() 2025-12-04T09:59:13.4242453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.4242537Z _init_core_state( 2025-12-04T09:59:13.4243431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.4243520Z _init_core_state( 2025-12-04T09:59:13.4244417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.4244510Z _init_core_state( 2025-12-04T09:59:13.4246064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4246212Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4247729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4247875Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4249415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4249563Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4251349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4251462Z _warn_cpu_init() 2025-12-04T09:59:13.4252371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.4252453Z _init_core_state( 2025-12-04T09:59:13.4253968Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4254151Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4255668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4255820Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4257664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4257845Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4258834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.4258956Z return func(*args, **kwargs) 2025-12-04T09:59:13.4260697Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4260860Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4261644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4261757Z return func(*args, **kwargs) 2025-12-04T09:59:13.4262525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4262635Z return func(*args, **kwargs) 2025-12-04T09:59:13.4263423Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4263539Z return func(*args, **kwargs) 2025-12-04T09:59:13.4264297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4264411Z return func(*args, **kwargs) 2025-12-04T09:59:13.4265161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4265267Z return func(*args, **kwargs) 2025-12-04T09:59:13.4266027Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4266164Z return func(*args, **kwargs) 2025-12-04T09:59:13.4266926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4267029Z return func(*args, **kwargs) 2025-12-04T09:59:13.4267786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4267897Z return func(*args, **kwargs) 2025-12-04T09:59:13.4268384Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4269030Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4269937Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4270389Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4271275Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4271633Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4272489Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4273098Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4274038Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4274496Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4275585Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4276029Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4276994Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4277478Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4279175Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T09:59:13.4279536Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4280201Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4281394Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4281755Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4282447Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4283007Z [rank1]:E1204 09:43:18.699000 59241 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.4283449Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4283969Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4285027Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4285506Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4286441Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4286922Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4287811Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4288247Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4289100Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4289531Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4290386Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4290813Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4291675Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4292112Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4293669Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 707723264 and is now 10516103168. 2025-12-04T09:59:13.4294032Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4294615Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4295696Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4296063Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4296937Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4297502Z [rank0]:E1204 09:43:18.700000 59240 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.4297960Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4298498Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4299493Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4300005Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4300999Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4301434Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4302406Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4302888Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4303850Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4304339Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4305315Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4305769Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4306726Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4307221Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4309192Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4309553Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4310137Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4311227Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4311589Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4312224Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4312711Z [rank2]:E1204 09:43:18.701000 59242 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.4313110Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4313586Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4314474Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4314922Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4315833Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4316186Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4317045Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4317481Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4318363Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4318795Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4319643Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4320043Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4321048Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4321749Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4323501Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T09:59:13.4323874Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4324589Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4325810Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4326180Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4326897Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4327443Z [rank3]:E1204 09:43:18.701000 59243 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.4327548Z dist init r=0, world=4 2025-12-04T09:59:13.4327643Z dist init r=3, world=4 2025-12-04T09:59:13.4327748Z dist init r=1, world=4 2025-12-04T09:59:13.4327843Z dist init r=2, world=4 2025-12-04T09:59:13.4329006Z [rank1]:[W1204 09:43:19.704075392 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4330188Z [rank0]:[W1204 09:43:19.704590489 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4331328Z [rank3]:[W1204 09:43:19.706943554 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4332462Z [rank2]:[W1204 09:43:19.717925474 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4332564Z FAILED [48.2722s] [100%] 2025-12-04T09:59:13.4332569Z 2025-12-04T09:59:13.4332764Z =================================== FAILURES =================================== 2025-12-04T09:59:13.4333154Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda _ 2025-12-04T09:59:13.4333278Z Traceback (most recent call last): 2025-12-04T09:59:13.4333905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.4334006Z self._join_processes(fn) 2025-12-04T09:59:13.4334527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.4334650Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.4335194Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.4335330Z raise RuntimeError(error) 2025-12-04T09:59:13.4335538Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.4335654Z Traceback (most recent call last): 2025-12-04T09:59:13.4336137Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4336236Z getattr(self, test_name)() 2025-12-04T09:59:13.4336978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4337068Z fn() 2025-12-04T09:59:13.4337628Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4337728Z method(*args, **kwargs) 2025-12-04T09:59:13.4338228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4338343Z method(*args, **kwargs) 2025-12-04T09:59:13.4338852Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4338949Z with policy(): 2025-12-04T09:59:13.4339460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4339566Z raise RuntimeError(msg) 2025-12-04T09:59:13.4340884Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T09:59:13.4340893Z 2025-12-04T09:59:13.4341108Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4341880Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4341891Z 2025-12-04T09:59:13.4342179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4342185Z 2025-12-04T09:59:13.4342190Z 2025-12-04T09:59:13.4342407Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.4342674Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.4343473Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d2fb83ab3ccdeb6.xml - 2025-12-04T09:59:13.4343656Z =========================== short test summary info ============================ 2025-12-04T09:59:13.4344578Z FAILED [48.2722s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.4344701Z Traceback (most recent call last): 2025-12-04T09:59:13.4345285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4345398Z getattr(self, test_name)() 2025-12-04T09:59:13.4345940Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4346027Z fn() 2025-12-04T09:59:13.4346531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4346642Z method(*args, **kwargs) 2025-12-04T09:59:13.4347145Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4347248Z method(*args, **kwargs) 2025-12-04T09:59:13.4347801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4347897Z with policy(): 2025-12-04T09:59:13.4348411Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4348516Z raise RuntimeError(msg) 2025-12-04T09:59:13.4349820Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T09:59:13.4349857Z 2025-12-04T09:59:13.4350054Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4350729Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4350736Z 2025-12-04T09:59:13.4350976Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4351136Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.4351293Z ====================== 1 failed, 26 deselected in 48.49s ======================= 2025-12-04T09:59:13.4351382Z Got exit code 1 2025-12-04T09:59:13.4351475Z Retrying single test... 2025-12-04T09:59:13.4352028Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bd911142cc34300e.xml 2025-12-04T09:59:13.4352169Z ============================= test session starts ============================== 2025-12-04T09:59:13.4352477Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.4352579Z cachedir: .pytest_cache 2025-12-04T09:59:13.4353033Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.4353138Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.4353243Z configfile: pytest.ini 2025-12-04T09:59:13.4353741Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.4353940Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.4354692Z stepcurrent: skipping 9 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4354791Z Running 1 items in this shard 2025-12-04T09:59:13.4354797Z 2025-12-04T09:59:13.4355801Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda I1204 09:43:39.494000 60437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 60489 2025-12-04T09:59:13.4356241Z I1204 09:43:39.495000 60437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 60490 2025-12-04T09:59:13.4356713Z I1204 09:43:39.496000 60437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 60491 2025-12-04T09:59:13.4357144Z I1204 09:43:39.497000 60437 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 60492 2025-12-04T09:59:13.4358972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4359086Z _warn_cpu_init() 2025-12-04T09:59:13.4360879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4360973Z _warn_cpu_init() 2025-12-04T09:59:13.4361874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.4361993Z _init_core_state( 2025-12-04T09:59:13.4362888Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.4362984Z _init_core_state( 2025-12-04T09:59:13.4364503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4364652Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4366170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4366319Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4368135Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4368222Z _warn_cpu_init() 2025-12-04T09:59:13.4370020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4370129Z _warn_cpu_init() 2025-12-04T09:59:13.4371038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.4371125Z _init_core_state( 2025-12-04T09:59:13.4372631Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4372815Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4373712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.FULL_SHARD since the world size is 1. 2025-12-04T09:59:13.4373805Z _init_core_state( 2025-12-04T09:59:13.4375318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4375492Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4377286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4377463Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4379160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4379322Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4381029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4381228Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4382237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.4382351Z return func(*args, **kwargs) 2025-12-04T09:59:13.4383136Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4383250Z return func(*args, **kwargs) 2025-12-04T09:59:13.4384010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4384135Z return func(*args, **kwargs) 2025-12-04T09:59:13.4384928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4385047Z return func(*args, **kwargs) 2025-12-04T09:59:13.4385810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4385920Z return func(*args, **kwargs) 2025-12-04T09:59:13.4386687Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4386797Z return func(*args, **kwargs) 2025-12-04T09:59:13.4387583Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4387704Z return func(*args, **kwargs) 2025-12-04T09:59:13.4388464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4388580Z return func(*args, **kwargs) 2025-12-04T09:59:13.4389381Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4389510Z return func(*args, **kwargs) 2025-12-04T09:59:13.4389925Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4390399Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4391305Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4391759Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4392649Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4393004Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4393855Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4394301Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4395180Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4395627Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4396476Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4396888Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4397834Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4398268Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4399836Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T09:59:13.4400160Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4400778Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4401863Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4402195Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4402830Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4403341Z [rank1]:E1204 09:44:03.503000 60490 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.4403750Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4404226Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4405121Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4405573Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4406462Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4406813Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4407695Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4411818Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4412735Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4413181Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4414038Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4414512Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4415372Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4415814Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4417767Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 10516103168. 2025-12-04T09:59:13.4418193Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4418854Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4420078Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4420479Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4421413Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4421975Z [rank0]:E1204 09:44:03.503000 60489 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.4422430Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4422968Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4423966Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4424480Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4425470Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4425937Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4426906Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4427391Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4428353Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4428836Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4429840Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4430294Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4431253Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4431750Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4433569Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T09:59:13.4433900Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4434482Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4435833Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4436176Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4436846Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4437363Z [rank3]:E1204 09:44:03.504000 60492 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.4437786Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4438286Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4439228Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4439711Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4440662Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4441032Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4441934Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4442392Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4443327Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4443786Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4444689Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4445113Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4446018Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4446517Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4448187Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4448517Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4449127Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4450219Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4450539Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4451168Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4451655Z [rank2]:E1204 09:44:03.504000 60491 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.4451745Z dist init r=1, world=4 2025-12-04T09:59:13.4451837Z dist init r=0, world=4 2025-12-04T09:59:13.4451921Z dist init r=3, world=4 2025-12-04T09:59:13.4452008Z dist init r=2, world=4 2025-12-04T09:59:13.4453048Z [rank1]:[W1204 09:44:03.462422849 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4454102Z [rank0]:[W1204 09:44:03.472381630 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4455119Z [rank3]:[W1204 09:44:03.472423753 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4456127Z [rank2]:[W1204 09:44:03.537324995 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4456223Z FAILED [43.7497s] [100%] 2025-12-04T09:59:13.4456255Z 2025-12-04T09:59:13.4456460Z =================================== FAILURES =================================== 2025-12-04T09:59:13.4456986Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda _ 2025-12-04T09:59:13.4457113Z Traceback (most recent call last): 2025-12-04T09:59:13.4457661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.4457772Z self._join_processes(fn) 2025-12-04T09:59:13.4458363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.4458501Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.4459153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.4459263Z raise RuntimeError(error) 2025-12-04T09:59:13.4459496Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.4459624Z Traceback (most recent call last): 2025-12-04T09:59:13.4460163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4460274Z getattr(self, test_name)() 2025-12-04T09:59:13.4460817Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4460938Z fn() 2025-12-04T09:59:13.4461448Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4461552Z method(*args, **kwargs) 2025-12-04T09:59:13.4462059Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4462168Z method(*args, **kwargs) 2025-12-04T09:59:13.4462674Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4462778Z with policy(): 2025-12-04T09:59:13.4463285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4463389Z raise RuntimeError(msg) 2025-12-04T09:59:13.4464701Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 10516103168. 2025-12-04T09:59:13.4464711Z 2025-12-04T09:59:13.4464926Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4465705Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4465712Z 2025-12-04T09:59:13.4466003Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4466009Z 2025-12-04T09:59:13.4466170Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.4466294Z Traceback (most recent call last): 2025-12-04T09:59:13.4466847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4466960Z getattr(self, test_name)() 2025-12-04T09:59:13.4467497Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4467586Z fn() 2025-12-04T09:59:13.4468097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4468201Z method(*args, **kwargs) 2025-12-04T09:59:13.4468729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4468956Z method(*args, **kwargs) 2025-12-04T09:59:13.4469445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4469544Z with policy(): 2025-12-04T09:59:13.4470033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4470137Z raise RuntimeError(msg) 2025-12-04T09:59:13.4471402Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T09:59:13.4471439Z 2025-12-04T09:59:13.4471646Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4472394Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4472400Z 2025-12-04T09:59:13.4472653Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4472658Z 2025-12-04T09:59:13.4472819Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.4472931Z Traceback (most recent call last): 2025-12-04T09:59:13.4473458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4473596Z getattr(self, test_name)() 2025-12-04T09:59:13.4474113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4474199Z fn() 2025-12-04T09:59:13.4474697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4474798Z method(*args, **kwargs) 2025-12-04T09:59:13.4475299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4475401Z method(*args, **kwargs) 2025-12-04T09:59:13.4475889Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4475987Z with policy(): 2025-12-04T09:59:13.4476479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4476580Z raise RuntimeError(msg) 2025-12-04T09:59:13.4477840Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4477847Z 2025-12-04T09:59:13.4478078Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4478934Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4478940Z 2025-12-04T09:59:13.4479188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4479196Z 2025-12-04T09:59:13.4479200Z 2025-12-04T09:59:13.4479411Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.4479657Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.4480408Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bd911142cc34300e.xml - 2025-12-04T09:59:13.4480605Z =========================== short test summary info ============================ 2025-12-04T09:59:13.4481479Z FAILED [43.7497s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.4481599Z Traceback (most recent call last): 2025-12-04T09:59:13.4482114Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4482221Z getattr(self, test_name)() 2025-12-04T09:59:13.4482730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4482812Z fn() 2025-12-04T09:59:13.4483323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4483420Z method(*args, **kwargs) 2025-12-04T09:59:13.4483895Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4483997Z method(*args, **kwargs) 2025-12-04T09:59:13.4484468Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4484558Z with policy(): 2025-12-04T09:59:13.4485043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4485169Z raise RuntimeError(msg) 2025-12-04T09:59:13.4486604Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 10516103168. 2025-12-04T09:59:13.4486614Z 2025-12-04T09:59:13.4486821Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4487567Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4487579Z 2025-12-04T09:59:13.4487834Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4487839Z 2025-12-04T09:59:13.4487993Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.4488113Z Traceback (most recent call last): 2025-12-04T09:59:13.4488639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4488743Z getattr(self, test_name)() 2025-12-04T09:59:13.4489269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4489351Z fn() 2025-12-04T09:59:13.4489844Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4490082Z method(*args, **kwargs) 2025-12-04T09:59:13.4490723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4490832Z method(*args, **kwargs) 2025-12-04T09:59:13.4491315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4491410Z with policy(): 2025-12-04T09:59:13.4491909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4492014Z raise RuntimeError(msg) 2025-12-04T09:59:13.4493303Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 611254272 and is now 10404954112. 2025-12-04T09:59:13.4493312Z 2025-12-04T09:59:13.4493517Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4494368Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4494374Z 2025-12-04T09:59:13.4494621Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4494628Z 2025-12-04T09:59:13.4494776Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.4494892Z Traceback (most recent call last): 2025-12-04T09:59:13.4495401Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4495542Z getattr(self, test_name)() 2025-12-04T09:59:13.4496047Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4496130Z fn() 2025-12-04T09:59:13.4496694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4496968Z method(*args, **kwargs) 2025-12-04T09:59:13.4497471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4497581Z method(*args, **kwargs) 2025-12-04T09:59:13.4498140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4498244Z with policy(): 2025-12-04T09:59:13.4498750Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4498857Z raise RuntimeError(msg) 2025-12-04T09:59:13.4500157Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4500163Z 2025-12-04T09:59:13.4500376Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4501141Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4501148Z 2025-12-04T09:59:13.4501409Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4501585Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.4501773Z ====================== 1 failed, 26 deselected in 43.97s ======================= 2025-12-04T09:59:13.4501868Z Got exit code 1 2025-12-04T09:59:13.4502591Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda 2025-12-04T09:59:13.4503000Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.4503618Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8e84025a0dc7a16.xml 2025-12-04T09:59:13.4503788Z ============================= test session starts ============================== 2025-12-04T09:59:13.4504135Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.4504245Z cachedir: .pytest_cache 2025-12-04T09:59:13.4504754Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.4504878Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.4504981Z configfile: pytest.ini 2025-12-04T09:59:13.4505543Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.4505762Z collecting ... collected 60 items / 10 deselected / 50 selected 2025-12-04T09:59:13.4505900Z stepcurrent: skipping 10 already run items. 2025-12-04T09:59:13.4506009Z Running 17 items in this shard 2025-12-04T09:59:13.4506015Z 2025-12-04T09:59:13.4507191Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda I1204 09:44:27.494000 61686 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 61738 2025-12-04T09:59:13.4507691Z I1204 09:44:27.495000 61686 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 61739 2025-12-04T09:59:13.4508218Z I1204 09:44:27.495000 61686 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 61740 2025-12-04T09:59:13.4508821Z I1204 09:44:27.496000 61686 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 61741 2025-12-04T09:59:13.4510758Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4510882Z _warn_cpu_init() 2025-12-04T09:59:13.4512659Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4512753Z _warn_cpu_init() 2025-12-04T09:59:13.4514520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4514610Z _warn_cpu_init() 2025-12-04T09:59:13.4515536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.4515626Z _init_core_state( 2025-12-04T09:59:13.4516567Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.4516651Z _init_core_state( 2025-12-04T09:59:13.4517574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.4517661Z _init_core_state( 2025-12-04T09:59:13.4519205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4519357Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4521038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4521367Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4523079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4523308Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4525326Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4525475Z _warn_cpu_init() 2025-12-04T09:59:13.4526512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.4526616Z _init_core_state( 2025-12-04T09:59:13.4528318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4528487Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4530181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4530355Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4532090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4532254Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4533252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.4533363Z return func(*args, **kwargs) 2025-12-04T09:59:13.4535053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4535199Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4535895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4535996Z return func(*args, **kwargs) 2025-12-04T09:59:13.4536931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4537098Z return func(*args, **kwargs) 2025-12-04T09:59:13.4537864Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4537979Z return func(*args, **kwargs) 2025-12-04T09:59:13.4538743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4538851Z return func(*args, **kwargs) 2025-12-04T09:59:13.4539613Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4539754Z return func(*args, **kwargs) 2025-12-04T09:59:13.4540520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4540639Z return func(*args, **kwargs) 2025-12-04T09:59:13.4541398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4541512Z return func(*args, **kwargs) 2025-12-04T09:59:13.4542269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4542372Z return func(*args, **kwargs) 2025-12-04T09:59:13.4542839Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4543370Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4544373Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4544913Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4545910Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4546305Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4547270Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4547764Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4548749Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4549319Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4550167Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4550571Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4551424Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4551904Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4553495Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T09:59:13.4553844Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4554433Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4555561Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4555892Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4556527Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4557010Z [rank2]:E1204 09:45:00.188000 61740 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.4557412Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4557881Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4558803Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4559250Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4560132Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4560480Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4561354Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4561795Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4562643Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4563076Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4563924Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4564354Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4565214Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4565644Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4567234Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 709820416 and is now 10516103168. 2025-12-04T09:59:13.4567773Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4568399Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4569585Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4569927Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4570595Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4571103Z [rank0]:E1204 09:45:00.188000 61738 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.4571565Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4572063Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4573007Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4573483Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4574411Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4574806Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4575710Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4576167Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4577361Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4577858Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4578855Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4579304Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4580268Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4580786Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4582580Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 600768512 and is now 10404954112. 2025-12-04T09:59:13.4582945Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4583607Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4584876Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4585248Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4585961Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4586531Z [rank1]:E1204 09:45:00.188000 61739 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.4586986Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4587510Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4588520Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4589234Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4590143Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4590495Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4591343Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4591783Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4592659Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4593094Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4593942Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4594342Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4595218Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4595653Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4597252Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4597571Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4598162Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4599290Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4599642Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4600274Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4600756Z [rank3]:E1204 09:45:00.189000 61741 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.4600850Z dist init r=2, world=4 2025-12-04T09:59:13.4600935Z dist init r=1, world=4 2025-12-04T09:59:13.4601026Z dist init r=3, world=4 2025-12-04T09:59:13.4601111Z dist init r=0, world=4 2025-12-04T09:59:13.4602135Z [rank2]:[W1204 09:45:00.160275534 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4603193Z [rank1]:[W1204 09:45:00.163833887 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4604203Z [rank3]:[W1204 09:45:00.166140670 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4605214Z [rank0]:[W1204 09:45:00.172471670 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4605331Z FAILED [49.1429s] [ 5%] 2025-12-04T09:59:13.4605337Z 2025-12-04T09:59:13.4605473Z =================================== FAILURES =================================== 2025-12-04T09:59:13.4605850Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda _ 2025-12-04T09:59:13.4605955Z Traceback (most recent call last): 2025-12-04T09:59:13.4606443Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.4606538Z self._join_processes(fn) 2025-12-04T09:59:13.4607053Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.4607210Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.4607743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.4607851Z raise RuntimeError(error) 2025-12-04T09:59:13.4608059Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.4608164Z Traceback (most recent call last): 2025-12-04T09:59:13.4608648Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4608748Z getattr(self, test_name)() 2025-12-04T09:59:13.4609215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4609295Z fn() 2025-12-04T09:59:13.4609744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4609834Z method(*args, **kwargs) 2025-12-04T09:59:13.4610284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4610373Z method(*args, **kwargs) 2025-12-04T09:59:13.4610816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4610998Z with policy(): 2025-12-04T09:59:13.4611446Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4611545Z raise RuntimeError(msg) 2025-12-04T09:59:13.4612734Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 709820416 and is now 10516103168. 2025-12-04T09:59:13.4612742Z 2025-12-04T09:59:13.4612930Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4613652Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4613683Z 2025-12-04T09:59:13.4613918Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4613923Z 2025-12-04T09:59:13.4614071Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.4614175Z Traceback (most recent call last): 2025-12-04T09:59:13.4614662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4614758Z getattr(self, test_name)() 2025-12-04T09:59:13.4615232Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4615317Z fn() 2025-12-04T09:59:13.4615763Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4615881Z method(*args, **kwargs) 2025-12-04T09:59:13.4616404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4616501Z method(*args, **kwargs) 2025-12-04T09:59:13.4617159Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4617256Z with policy(): 2025-12-04T09:59:13.4617759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4617869Z raise RuntimeError(msg) 2025-12-04T09:59:13.4619254Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 600768512 and is now 10404954112. 2025-12-04T09:59:13.4619262Z 2025-12-04T09:59:13.4619481Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4620288Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4620294Z 2025-12-04T09:59:13.4620554Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4620560Z 2025-12-04T09:59:13.4620726Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.4621060Z Traceback (most recent call last): 2025-12-04T09:59:13.4621621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4621732Z getattr(self, test_name)() 2025-12-04T09:59:13.4622264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4622360Z fn() 2025-12-04T09:59:13.4622862Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4622962Z method(*args, **kwargs) 2025-12-04T09:59:13.4623538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4623640Z method(*args, **kwargs) 2025-12-04T09:59:13.4624144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4624239Z with policy(): 2025-12-04T09:59:13.4624748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4624857Z raise RuntimeError(msg) 2025-12-04T09:59:13.4626233Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4626242Z 2025-12-04T09:59:13.4626461Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4627269Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4627275Z 2025-12-04T09:59:13.4627533Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4627547Z 2025-12-04T09:59:13.4627551Z 2025-12-04T09:59:13.4627766Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.4628027Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.4628867Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8e84025a0dc7a16.xml - 2025-12-04T09:59:13.4629037Z =========================== short test summary info ============================ 2025-12-04T09:59:13.4630002Z FAILED [49.1429s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.4630120Z Traceback (most recent call last): 2025-12-04T09:59:13.4630660Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4630810Z getattr(self, test_name)() 2025-12-04T09:59:13.4631343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4631431Z fn() 2025-12-04T09:59:13.4631936Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4632038Z method(*args, **kwargs) 2025-12-04T09:59:13.4632664Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4632763Z method(*args, **kwargs) 2025-12-04T09:59:13.4633329Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4633420Z with policy(): 2025-12-04T09:59:13.4633867Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4633961Z raise RuntimeError(msg) 2025-12-04T09:59:13.4635146Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 709820416 and is now 10516103168. 2025-12-04T09:59:13.4635153Z 2025-12-04T09:59:13.4635340Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4636095Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4636101Z 2025-12-04T09:59:13.4636330Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4636335Z 2025-12-04T09:59:13.4636480Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.4636585Z Traceback (most recent call last): 2025-12-04T09:59:13.4637068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4637169Z getattr(self, test_name)() 2025-12-04T09:59:13.4637642Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4637724Z fn() 2025-12-04T09:59:13.4638200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4638289Z method(*args, **kwargs) 2025-12-04T09:59:13.4638740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4638828Z method(*args, **kwargs) 2025-12-04T09:59:13.4639271Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4639362Z with policy(): 2025-12-04T09:59:13.4639808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4639907Z raise RuntimeError(msg) 2025-12-04T09:59:13.4641125Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 600768512 and is now 10404954112. 2025-12-04T09:59:13.4641130Z 2025-12-04T09:59:13.4641318Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4642040Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4642045Z 2025-12-04T09:59:13.4642277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4642306Z 2025-12-04T09:59:13.4642454Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.4642560Z Traceback (most recent call last): 2025-12-04T09:59:13.4643046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4643149Z getattr(self, test_name)() 2025-12-04T09:59:13.4643620Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4643704Z fn() 2025-12-04T09:59:13.4644148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4644239Z method(*args, **kwargs) 2025-12-04T09:59:13.4644690Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4644781Z method(*args, **kwargs) 2025-12-04T09:59:13.4645227Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4645309Z with policy(): 2025-12-04T09:59:13.4645757Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4645856Z raise RuntimeError(msg) 2025-12-04T09:59:13.4647073Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4647079Z 2025-12-04T09:59:13.4647273Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4647989Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4647996Z 2025-12-04T09:59:13.4648225Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4648390Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.4648546Z ====================== 1 failed, 10 deselected in 49.36s ======================= 2025-12-04T09:59:13.4648661Z Got exit code 1 2025-12-04T09:59:13.4648754Z Retrying single test... 2025-12-04T09:59:13.4649302Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-392d2e7951c1c5f3.xml 2025-12-04T09:59:13.4649453Z ============================= test session starts ============================== 2025-12-04T09:59:13.4649760Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.4649853Z cachedir: .pytest_cache 2025-12-04T09:59:13.4650316Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.4650423Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.4650546Z configfile: pytest.ini 2025-12-04T09:59:13.4651020Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.4651212Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.4652014Z stepcurrent: skipping 10 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4652119Z Running 1 items in this shard 2025-12-04T09:59:13.4652124Z 2025-12-04T09:59:13.4653155Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda I1204 09:45:20.943000 62935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 62987 2025-12-04T09:59:13.4653628Z I1204 09:45:20.944000 62935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 62988 2025-12-04T09:59:13.4654062Z I1204 09:45:20.945000 62935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 62989 2025-12-04T09:59:13.4654508Z I1204 09:45:20.946000 62935 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 62990 2025-12-04T09:59:13.4656381Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4656486Z _warn_cpu_init() 2025-12-04T09:59:13.4658680Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4658789Z _warn_cpu_init() 2025-12-04T09:59:13.4660781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4660886Z _warn_cpu_init() 2025-12-04T09:59:13.4661920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.4662046Z _init_core_state( 2025-12-04T09:59:13.4663082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.4663176Z _init_core_state( 2025-12-04T09:59:13.4664208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.4664304Z _init_core_state( 2025-12-04T09:59:13.4666011Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4666213Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4667921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4668123Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4669939Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4670095Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4671880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4671971Z _warn_cpu_init() 2025-12-04T09:59:13.4672881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.4672965Z _init_core_state( 2025-12-04T09:59:13.4674522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4674663Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4676175Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4676320Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4677866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4678006Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4678886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.4678984Z return func(*args, **kwargs) 2025-12-04T09:59:13.4680516Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4680663Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4681346Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4681449Z return func(*args, **kwargs) 2025-12-04T09:59:13.4682154Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4682254Z return func(*args, **kwargs) 2025-12-04T09:59:13.4682938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4683032Z return func(*args, **kwargs) 2025-12-04T09:59:13.4683710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4683802Z return func(*args, **kwargs) 2025-12-04T09:59:13.4684467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4684568Z return func(*args, **kwargs) 2025-12-04T09:59:13.4685235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4685336Z return func(*args, **kwargs) 2025-12-04T09:59:13.4686006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4686123Z return func(*args, **kwargs) 2025-12-04T09:59:13.4686799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4686890Z return func(*args, **kwargs) 2025-12-04T09:59:13.4687297Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4687771Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4688658Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4689142Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4690014Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4690367Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4691218Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4691678Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4692533Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4692963Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4693814Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4694243Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4695103Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4695541Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4697457Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4697830Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4698490Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4699803Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4700168Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4700886Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4701427Z [rank2]:E1204 09:45:44.175000 62989 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.4701878Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4702411Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4703437Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4703949Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4704942Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4705345Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4706334Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4706824Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4707775Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4708258Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4709328Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4709723Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4710581Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4711015Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4712607Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 10516103168. 2025-12-04T09:59:13.4712931Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4713539Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4714665Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4714986Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4715631Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4716112Z [rank0]:E1204 09:45:44.175000 62987 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.4716537Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4717007Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4717895Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4718346Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4719222Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4719606Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4720456Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4721033Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4722154Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4722711Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4723687Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4724130Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4725097Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4725580Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4727401Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T09:59:13.4727803Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4728461Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4729731Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4730094Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4730847Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4731391Z [rank1]:E1204 09:45:44.176000 62988 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.4731845Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4732371Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4733369Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4733993Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4734869Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4735221Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4736070Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4736631Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4737734Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4738227Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4739193Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4739636Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4740601Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4741090Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4742920Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T09:59:13.4743286Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4743947Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4745221Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4745613Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4746338Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4746878Z [rank3]:E1204 09:45:44.177000 62990 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.4746983Z dist init r=1, world=4 2025-12-04T09:59:13.4747079Z dist init r=2, world=4 2025-12-04T09:59:13.4747171Z dist init r=0, world=4 2025-12-04T09:59:13.4747273Z dist init r=3, world=4 2025-12-04T09:59:13.4748432Z [rank1]:[W1204 09:45:44.144689821 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4749717Z [rank2]:[W1204 09:45:44.145283891 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4750786Z [rank0]:[W1204 09:45:44.148974604 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4751883Z [rank3]:[W1204 09:45:44.152390307 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4751985Z FAILED [41.1533s] [100%] 2025-12-04T09:59:13.4751991Z 2025-12-04T09:59:13.4752128Z =================================== FAILURES =================================== 2025-12-04T09:59:13.4752531Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda _ 2025-12-04T09:59:13.4752640Z Traceback (most recent call last): 2025-12-04T09:59:13.4753333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.4753451Z self._join_processes(fn) 2025-12-04T09:59:13.4754016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.4754156Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.4754738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.4754846Z raise RuntimeError(error) 2025-12-04T09:59:13.4755076Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.4755189Z Traceback (most recent call last): 2025-12-04T09:59:13.4755740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4755850Z getattr(self, test_name)() 2025-12-04T09:59:13.4756366Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4756455Z fn() 2025-12-04T09:59:13.4756943Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4757045Z method(*args, **kwargs) 2025-12-04T09:59:13.4757536Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4757635Z method(*args, **kwargs) 2025-12-04T09:59:13.4758156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4758248Z with policy(): 2025-12-04T09:59:13.4758745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4758853Z raise RuntimeError(msg) 2025-12-04T09:59:13.4760150Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4760158Z 2025-12-04T09:59:13.4760371Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4761162Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4761196Z 2025-12-04T09:59:13.4761453Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4761460Z 2025-12-04T09:59:13.4761469Z 2025-12-04T09:59:13.4761677Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.4761931Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.4762715Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-392d2e7951c1c5f3.xml - 2025-12-04T09:59:13.4762909Z =========================== short test summary info ============================ 2025-12-04T09:59:13.4763843Z FAILED [41.1533s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.4763962Z Traceback (most recent call last): 2025-12-04T09:59:13.4764494Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4764607Z getattr(self, test_name)() 2025-12-04T09:59:13.4765124Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4765206Z fn() 2025-12-04T09:59:13.4765696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4765797Z method(*args, **kwargs) 2025-12-04T09:59:13.4766283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4766380Z method(*args, **kwargs) 2025-12-04T09:59:13.4766866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4766962Z with policy(): 2025-12-04T09:59:13.4767482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4767585Z raise RuntimeError(msg) 2025-12-04T09:59:13.4768881Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4768890Z 2025-12-04T09:59:13.4769204Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4769974Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4769981Z 2025-12-04T09:59:13.4770226Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4770428Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.4770595Z ====================== 1 failed, 26 deselected in 41.37s ======================= 2025-12-04T09:59:13.4770683Z Got exit code 1 2025-12-04T09:59:13.4770789Z Retrying single test... 2025-12-04T09:59:13.4771370Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-477ee10c9167da98.xml 2025-12-04T09:59:13.4771519Z ============================= test session starts ============================== 2025-12-04T09:59:13.4771858Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.4771960Z cachedir: .pytest_cache 2025-12-04T09:59:13.4772446Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.4772584Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.4772681Z configfile: pytest.ini 2025-12-04T09:59:13.4773191Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.4773394Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.4774244Z stepcurrent: skipping 10 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4774352Z Running 1 items in this shard 2025-12-04T09:59:13.4774383Z 2025-12-04T09:59:13.4775474Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda I1204 09:46:06.984000 64184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 64236 2025-12-04T09:59:13.4775945Z I1204 09:46:06.985000 64184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 64237 2025-12-04T09:59:13.4776659Z I1204 09:46:06.986000 64184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 64238 2025-12-04T09:59:13.4777322Z I1204 09:46:06.986000 64184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 64239 2025-12-04T09:59:13.4779362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4779471Z _warn_cpu_init() 2025-12-04T09:59:13.4781514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4781611Z _warn_cpu_init() 2025-12-04T09:59:13.4783620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4783718Z _warn_cpu_init() 2025-12-04T09:59:13.4784793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.4784888Z _init_core_state( 2025-12-04T09:59:13.4786616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4786780Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4787813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.4787945Z _init_core_state( 2025-12-04T09:59:13.4789683Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4789834Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4791645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4791735Z _warn_cpu_init() 2025-12-04T09:59:13.4792643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.4792731Z _init_core_state( 2025-12-04T09:59:13.4794241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4794387Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4795327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:479: UserWarning: FSDP is switching to use `NO_SHARD` instead of ShardingStrategy.SHARD_GRAD_OP since the world size is 1. 2025-12-04T09:59:13.4795411Z _init_core_state( 2025-12-04T09:59:13.4797148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4797303Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4798945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4799106Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4800040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.4800146Z return func(*args, **kwargs) 2025-12-04T09:59:13.4801747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4801932Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4803534Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.4803690Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.4804410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4804539Z return func(*args, **kwargs) 2025-12-04T09:59:13.4805258Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4805361Z return func(*args, **kwargs) 2025-12-04T09:59:13.4806081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4806178Z return func(*args, **kwargs) 2025-12-04T09:59:13.4806887Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.4806996Z return func(*args, **kwargs) 2025-12-04T09:59:13.4807881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4807991Z return func(*args, **kwargs) 2025-12-04T09:59:13.4808725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4808829Z return func(*args, **kwargs) 2025-12-04T09:59:13.4809597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4809700Z return func(*args, **kwargs) 2025-12-04T09:59:13.4810436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.4810536Z return func(*args, **kwargs) 2025-12-04T09:59:13.4810983Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4811509Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4812507Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4813004Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4813962Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4814345Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4815280Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4815782Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4816964Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4817451Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4818518Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4818964Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4819924Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4820418Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4822442Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 720306176 and is now 10516103168. 2025-12-04T09:59:13.4822825Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4823485Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4824824Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4825188Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4825907Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4826451Z [rank0]:E1204 09:46:30.093000 64236 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.4826907Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4827479Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4828482Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4828994Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4829985Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4830416Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4831381Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4831863Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4832935Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4833404Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4834260Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4834651Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4835507Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4835947Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4837539Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4837898Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4838477Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4839789Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4840130Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4840809Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4841347Z [rank2]:E1204 09:46:30.094000 64238 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.4841771Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4842271Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4843212Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4843689Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4844649Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4845019Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4845928Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4846559Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4847524Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4847996Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4848925Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4849352Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4850283Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4850766Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4852708Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T09:59:13.4853038Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4853622Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4854745Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4855090Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4855732Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4856211Z [rank1]:E1204 09:46:30.094000 64237 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.4856847Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4857393Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4858393Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4858948Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4859933Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4860329Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4861323Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4861811Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4862784Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4863266Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4864233Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4864677Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4865632Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4866163Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4867966Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 609157120 and is now 10404954112. 2025-12-04T09:59:13.4868336Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4869083Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4870312Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4870653Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4871328Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4871839Z [rank3]:E1204 09:46:30.095000 64239 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.4871933Z dist init r=0, world=4 2025-12-04T09:59:13.4872028Z dist init r=3, world=4 2025-12-04T09:59:13.4872144Z dist init r=2, world=4 2025-12-04T09:59:13.4872233Z dist init r=1, world=4 2025-12-04T09:59:13.4873324Z [rank0]:[W1204 09:46:30.063605831 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4874401Z [rank3]:[W1204 09:46:30.067176115 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4875478Z [rank2]:[W1204 09:46:30.067701587 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4876569Z [rank1]:[W1204 09:46:30.073377698 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4876670Z FAILED [41.7880s] [100%] 2025-12-04T09:59:13.4876677Z 2025-12-04T09:59:13.4876815Z =================================== FAILURES =================================== 2025-12-04T09:59:13.4877315Z _ TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda _ 2025-12-04T09:59:13.4877425Z Traceback (most recent call last): 2025-12-04T09:59:13.4877906Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.4878012Z self._join_processes(fn) 2025-12-04T09:59:13.4878524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.4878647Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.4879191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.4879290Z raise RuntimeError(error) 2025-12-04T09:59:13.4879886Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.4879993Z Traceback (most recent call last): 2025-12-04T09:59:13.4880474Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4880580Z getattr(self, test_name)() 2025-12-04T09:59:13.4881050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4881131Z fn() 2025-12-04T09:59:13.4881589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4881686Z method(*args, **kwargs) 2025-12-04T09:59:13.4882169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4882258Z method(*args, **kwargs) 2025-12-04T09:59:13.4882706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4882795Z with policy(): 2025-12-04T09:59:13.4883242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4883338Z raise RuntimeError(msg) 2025-12-04T09:59:13.4884536Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 720306176 and is now 10516103168. 2025-12-04T09:59:13.4884573Z 2025-12-04T09:59:13.4884763Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4885487Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4885493Z 2025-12-04T09:59:13.4885726Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4885731Z 2025-12-04T09:59:13.4885878Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.4885979Z Traceback (most recent call last): 2025-12-04T09:59:13.4886460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4886591Z getattr(self, test_name)() 2025-12-04T09:59:13.4887058Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4887135Z fn() 2025-12-04T09:59:13.4887586Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4887678Z method(*args, **kwargs) 2025-12-04T09:59:13.4888129Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4888217Z method(*args, **kwargs) 2025-12-04T09:59:13.4888662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4888751Z with policy(): 2025-12-04T09:59:13.4889200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4889293Z raise RuntimeError(msg) 2025-12-04T09:59:13.4890483Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T09:59:13.4890492Z 2025-12-04T09:59:13.4890706Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4891431Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4891436Z 2025-12-04T09:59:13.4891667Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4891672Z 2025-12-04T09:59:13.4891818Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.4891922Z Traceback (most recent call last): 2025-12-04T09:59:13.4892402Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4892506Z getattr(self, test_name)() 2025-12-04T09:59:13.4892977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4893084Z fn() 2025-12-04T09:59:13.4893532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4893620Z method(*args, **kwargs) 2025-12-04T09:59:13.4894066Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4894156Z method(*args, **kwargs) 2025-12-04T09:59:13.4894598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4894694Z with policy(): 2025-12-04T09:59:13.4895142Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4895267Z raise RuntimeError(msg) 2025-12-04T09:59:13.4896534Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4896541Z 2025-12-04T09:59:13.4896905Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4897729Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4897788Z 2025-12-04T09:59:13.4898052Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4898058Z 2025-12-04T09:59:13.4898063Z 2025-12-04T09:59:13.4898287Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.4898550Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.4899356Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-477ee10c9167da98.xml - 2025-12-04T09:59:13.4899526Z =========================== short test summary info ============================ 2025-12-04T09:59:13.4900487Z FAILED [41.7880s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.4900612Z Traceback (most recent call last): 2025-12-04T09:59:13.4901168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4901285Z getattr(self, test_name)() 2025-12-04T09:59:13.4901818Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4901904Z fn() 2025-12-04T09:59:13.4902415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4902544Z method(*args, **kwargs) 2025-12-04T09:59:13.4903046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4903150Z method(*args, **kwargs) 2025-12-04T09:59:13.4903651Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4903755Z with policy(): 2025-12-04T09:59:13.4904260Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4904364Z raise RuntimeError(msg) 2025-12-04T09:59:13.4905748Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 720306176 and is now 10516103168. 2025-12-04T09:59:13.4905757Z 2025-12-04T09:59:13.4905969Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4906784Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4906789Z 2025-12-04T09:59:13.4907050Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4907058Z 2025-12-04T09:59:13.4907214Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.4907336Z Traceback (most recent call last): 2025-12-04T09:59:13.4907878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4908023Z getattr(self, test_name)() 2025-12-04T09:59:13.4908552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4908639Z fn() 2025-12-04T09:59:13.4909249Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4909345Z method(*args, **kwargs) 2025-12-04T09:59:13.4909821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4909913Z method(*args, **kwargs) 2025-12-04T09:59:13.4910416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4910511Z with policy(): 2025-12-04T09:59:13.4911084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4911182Z raise RuntimeError(msg) 2025-12-04T09:59:13.4912373Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 607059968 and is now 10404954112. 2025-12-04T09:59:13.4912378Z 2025-12-04T09:59:13.4912562Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4913282Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4913288Z 2025-12-04T09:59:13.4913519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4913523Z 2025-12-04T09:59:13.4913670Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.4913772Z Traceback (most recent call last): 2025-12-04T09:59:13.4914262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4914392Z getattr(self, test_name)() 2025-12-04T09:59:13.4914866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4914945Z fn() 2025-12-04T09:59:13.4915394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4915486Z method(*args, **kwargs) 2025-12-04T09:59:13.4915938Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4916025Z method(*args, **kwargs) 2025-12-04T09:59:13.4916466Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4916557Z with policy(): 2025-12-04T09:59:13.4917033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4917128Z raise RuntimeError(msg) 2025-12-04T09:59:13.4918313Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 604962816 and is now 10404954112. 2025-12-04T09:59:13.4918318Z 2025-12-04T09:59:13.4918505Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4919226Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4919257Z 2025-12-04T09:59:13.4919490Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4919653Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.4919811Z ====================== 1 failed, 26 deselected in 42.01s ======================= 2025-12-04T09:59:13.4919892Z Got exit code 1 2025-12-04T09:59:13.4920543Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda 2025-12-04T09:59:13.4921046Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.4921870Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-96eeb012f5f596ba.xml 2025-12-04T09:59:13.4922033Z ============================= test session starts ============================== 2025-12-04T09:59:13.4922382Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.4922493Z cachedir: .pytest_cache 2025-12-04T09:59:13.4923006Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.4923123Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.4923232Z configfile: pytest.ini 2025-12-04T09:59:13.4923764Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.4923985Z collecting ... collected 60 items / 11 deselected / 49 selected 2025-12-04T09:59:13.4924121Z stepcurrent: skipping 11 already run items. 2025-12-04T09:59:13.4924231Z Running 16 items in this shard 2025-12-04T09:59:13.4924236Z 2025-12-04T09:59:13.4925301Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda I1204 09:46:53.034000 65433 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 65485 2025-12-04T09:59:13.4925800Z I1204 09:46:53.035000 65433 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 65486 2025-12-04T09:59:13.4926352Z I1204 09:46:53.036000 65433 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 65487 2025-12-04T09:59:13.4926841Z I1204 09:46:53.036000 65433 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 65488 2025-12-04T09:59:13.4928880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4928984Z _warn_cpu_init() 2025-12-04T09:59:13.4931025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4931127Z _warn_cpu_init() 2025-12-04T09:59:13.4933126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4933265Z _warn_cpu_init() 2025-12-04T09:59:13.4935305Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.4935396Z _warn_cpu_init() 2025-12-04T09:59:13.4936396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.4936509Z return func(*args, **kwargs) 2025-12-04T09:59:13.4937117Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4937659Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4938659Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4939166Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4940160Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4940555Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4941553Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4942047Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4943001Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4943494Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4944453Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4944931Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4945898Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4946385Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4948068Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 707723264 and is now 758054912. 2025-12-04T09:59:13.4948476Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4949238Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4950324Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.4950705Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4951374Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4951887Z [rank0]:E1204 09:47:01.501000 65485 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.4952316Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4952811Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4953765Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4954246Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4955269Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4955653Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4956502Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4956933Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4957789Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4958229Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4959097Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4959495Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4960347Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4960779Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4962301Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T09:59:13.4962619Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4963204Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4964248Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.4964579Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4965214Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4965696Z [rank3]:E1204 09:47:01.501000 65488 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.4966096Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4966559Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4967446Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4967895Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4968799Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4969147Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4969992Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4970426Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4971301Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4971735Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4972583Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4972978Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4973833Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4974291Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4975779Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.4976096Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4976975Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4978134Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.4978505Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4979217Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4979758Z [rank2]:E1204 09:47:01.501000 65487 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.4980205Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.4980732Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.4981748Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4982280Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.4983270Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4983664Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.4984620Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4985112Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4986096Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.4986585Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.4987545Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.4987996Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.4989090Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.4989530Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.4991027Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T09:59:13.4991377Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4991964Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.4992984Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.4993306Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.4993936Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.4994419Z [rank1]:E1204 09:47:01.502000 65486 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.4994511Z dist init r=2, world=4 2025-12-04T09:59:13.4994598Z dist init r=3, world=4 2025-12-04T09:59:13.4994688Z dist init r=1, world=4 2025-12-04T09:59:13.4994771Z dist init r=0, world=4 2025-12-04T09:59:13.4995821Z [rank0]:[W1204 09:47:01.469977700 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.4995913Z FAILED [10.7961s] [ 6%] 2025-12-04T09:59:13.4995918Z 2025-12-04T09:59:13.4996044Z =================================== FAILURES =================================== 2025-12-04T09:59:13.4996325Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda __ 2025-12-04T09:59:13.4996432Z Traceback (most recent call last): 2025-12-04T09:59:13.4996912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.4997021Z self._join_processes(fn) 2025-12-04T09:59:13.4997537Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.4997699Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.4998243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.4998342Z raise RuntimeError(error) 2025-12-04T09:59:13.4998554Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.4998658Z Traceback (most recent call last): 2025-12-04T09:59:13.4999133Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.4999239Z getattr(self, test_name)() 2025-12-04T09:59:13.4999712Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.4999817Z fn() 2025-12-04T09:59:13.5000269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5000360Z method(*args, **kwargs) 2025-12-04T09:59:13.5000814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5000908Z method(*args, **kwargs) 2025-12-04T09:59:13.5001352Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5001443Z with policy(): 2025-12-04T09:59:13.5001896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5002018Z raise RuntimeError(msg) 2025-12-04T09:59:13.5003119Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.5003126Z 2025-12-04T09:59:13.5003319Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5003948Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5003954Z 2025-12-04T09:59:13.5004186Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5004191Z 2025-12-04T09:59:13.5004195Z 2025-12-04T09:59:13.5004389Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.5004622Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.5005334Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-96eeb012f5f596ba.xml - 2025-12-04T09:59:13.5005492Z =========================== short test summary info ============================ 2025-12-04T09:59:13.5006276Z FAILED [10.7961s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.5006392Z Traceback (most recent call last): 2025-12-04T09:59:13.5006876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5006973Z getattr(self, test_name)() 2025-12-04T09:59:13.5007458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5007538Z fn() 2025-12-04T09:59:13.5007993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5008087Z method(*args, **kwargs) 2025-12-04T09:59:13.5008534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5008662Z method(*args, **kwargs) 2025-12-04T09:59:13.5009111Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5009193Z with policy(): 2025-12-04T09:59:13.5009645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5009742Z raise RuntimeError(msg) 2025-12-04T09:59:13.5010841Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.5010874Z 2025-12-04T09:59:13.5011063Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5011684Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5011696Z 2025-12-04T09:59:13.5011929Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5012085Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.5012248Z ====================== 1 failed, 11 deselected in 11.01s ======================= 2025-12-04T09:59:13.5012332Z Got exit code 1 2025-12-04T09:59:13.5012424Z Retrying single test... 2025-12-04T09:59:13.5013009Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc37fd9d84da442a.xml 2025-12-04T09:59:13.5013150Z ============================= test session starts ============================== 2025-12-04T09:59:13.5013461Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.5013555Z cachedir: .pytest_cache 2025-12-04T09:59:13.5014012Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.5014127Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.5014219Z configfile: pytest.ini 2025-12-04T09:59:13.5014694Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.5014894Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.5015581Z stepcurrent: skipping 11 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5015690Z Running 1 items in this shard 2025-12-04T09:59:13.5015695Z 2025-12-04T09:59:13.5016888Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda I1204 09:47:08.523000 65770 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 65822 2025-12-04T09:59:13.5017427Z I1204 09:47:08.524000 65770 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 65823 2025-12-04T09:59:13.5017927Z I1204 09:47:08.525000 65770 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 65824 2025-12-04T09:59:13.5018413Z I1204 09:47:08.526000 65770 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 65825 2025-12-04T09:59:13.5020447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5020577Z _warn_cpu_init() 2025-12-04T09:59:13.5022792Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5022893Z _warn_cpu_init() 2025-12-04T09:59:13.5024899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5025129Z _warn_cpu_init() 2025-12-04T09:59:13.5027142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5027277Z _warn_cpu_init() 2025-12-04T09:59:13.5028275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.5028394Z return func(*args, **kwargs) 2025-12-04T09:59:13.5028851Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5029391Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5030389Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5030897Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5031890Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5032287Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5033379Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5033841Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5034743Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5035198Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5036131Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5036555Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5037457Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5037921Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5039676Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912. 2025-12-04T09:59:13.5040070Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5040706Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5041832Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5042208Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5042904Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5043447Z [rank0]:E1204 09:47:16.980000 65822 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.5043881Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5044399Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5045365Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5045853Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5046861Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5047246Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5048178Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5048649Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5049591Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5050085Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5051011Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5051445Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5052375Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5052861Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5054507Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T09:59:13.5054868Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5055497Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5056728Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5057266Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5057988Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5058542Z [rank1]:E1204 09:47:16.981000 65823 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.5058992Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5059532Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5060531Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5061034Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5062060Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5062457Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5063420Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5063908Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5064917Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5065403Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5066360Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5066812Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5067776Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5068419Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5070078Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T09:59:13.5070440Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5071020Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5072052Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5072370Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5073002Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5073488Z [rank2]:E1204 09:47:16.981000 65824 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.5073888Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5074365Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5075276Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5075725Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5076600Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5076948Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5077808Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5078269Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5079131Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5079557Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5080403Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5080834Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5081686Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5082125Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5083605Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.5083964Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5084546Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5085560Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5085886Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5086515Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5087010Z [rank3]:E1204 09:47:16.981000 65825 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.5087100Z dist init r=1, world=4 2025-12-04T09:59:13.5087185Z dist init r=3, world=4 2025-12-04T09:59:13.5087274Z dist init r=2, world=4 2025-12-04T09:59:13.5087361Z dist init r=0, world=4 2025-12-04T09:59:13.5088412Z [rank0]:[W1204 09:47:17.988381341 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.5088501Z FAILED [10.8152s] [100%] 2025-12-04T09:59:13.5088506Z 2025-12-04T09:59:13.5088636Z =================================== FAILURES =================================== 2025-12-04T09:59:13.5088925Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda __ 2025-12-04T09:59:13.5089030Z Traceback (most recent call last): 2025-12-04T09:59:13.5089519Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.5089618Z self._join_processes(fn) 2025-12-04T09:59:13.5090163Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.5090292Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.5090827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.5090925Z raise RuntimeError(error) 2025-12-04T09:59:13.5091135Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.5091239Z Traceback (most recent call last): 2025-12-04T09:59:13.5091723Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5091819Z getattr(self, test_name)() 2025-12-04T09:59:13.5092290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5092409Z fn() 2025-12-04T09:59:13.5092857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5092949Z method(*args, **kwargs) 2025-12-04T09:59:13.5093405Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5093500Z method(*args, **kwargs) 2025-12-04T09:59:13.5093951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5094036Z with policy(): 2025-12-04T09:59:13.5094527Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5094632Z raise RuntimeError(msg) 2025-12-04T09:59:13.5095717Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T09:59:13.5095725Z 2025-12-04T09:59:13.5095921Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5096615Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5096622Z 2025-12-04T09:59:13.5097048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5097063Z 2025-12-04T09:59:13.5097071Z 2025-12-04T09:59:13.5097289Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.5097594Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.5098399Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc37fd9d84da442a.xml - 2025-12-04T09:59:13.5098570Z =========================== short test summary info ============================ 2025-12-04T09:59:13.5099475Z FAILED [10.8152s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.5099596Z Traceback (most recent call last): 2025-12-04T09:59:13.5100147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5100267Z getattr(self, test_name)() 2025-12-04T09:59:13.5100803Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5100892Z fn() 2025-12-04T09:59:13.5101404Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5101507Z method(*args, **kwargs) 2025-12-04T09:59:13.5102045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5102152Z method(*args, **kwargs) 2025-12-04T09:59:13.5102652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5102755Z with policy(): 2025-12-04T09:59:13.5103261Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5103366Z raise RuntimeError(msg) 2025-12-04T09:59:13.5104600Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T09:59:13.5104634Z 2025-12-04T09:59:13.5104846Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5105553Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5105559Z 2025-12-04T09:59:13.5105818Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5106004Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.5106182Z ====================== 1 failed, 26 deselected in 11.03s ======================= 2025-12-04T09:59:13.5106307Z Got exit code 1 2025-12-04T09:59:13.5106416Z Retrying single test... 2025-12-04T09:59:13.5107038Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cbd7e5f481e859be.xml 2025-12-04T09:59:13.5107194Z ============================= test session starts ============================== 2025-12-04T09:59:13.5107549Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.5107657Z cachedir: .pytest_cache 2025-12-04T09:59:13.5108178Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.5108299Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.5108401Z configfile: pytest.ini 2025-12-04T09:59:13.5109155Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.5109352Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.5110047Z stepcurrent: skipping 11 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5110146Z Running 1 items in this shard 2025-12-04T09:59:13.5110151Z 2025-12-04T09:59:13.5111123Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda I1204 09:47:23.993000 66107 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 66159 2025-12-04T09:59:13.5111572Z I1204 09:47:23.994000 66107 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 66160 2025-12-04T09:59:13.5112005Z I1204 09:47:23.995000 66107 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 66161 2025-12-04T09:59:13.5112440Z I1204 09:47:23.996000 66107 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 66162 2025-12-04T09:59:13.5114274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5114371Z _warn_cpu_init() 2025-12-04T09:59:13.5116158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5116249Z _warn_cpu_init() 2025-12-04T09:59:13.5118033Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5118143Z _warn_cpu_init() 2025-12-04T09:59:13.5119929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5120042Z _warn_cpu_init() 2025-12-04T09:59:13.5121090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.5121371Z return func(*args, **kwargs) 2025-12-04T09:59:13.5121843Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5122374Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5123379Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5123893Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5124881Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5125343Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5126298Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5126785Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5127754Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5128241Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5129241Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5129686Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5130651Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5131141Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5132857Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912. 2025-12-04T09:59:13.5133222Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5133951Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5135014Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5135338Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5135981Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5136527Z [rank0]:E1204 09:47:32.598000 66159 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.5137130Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5137666Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5138669Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5139183Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5140212Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5140614Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5141568Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5142056Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5143046Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5143540Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5144502Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5144945Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5145918Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5146449Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5148135Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.5148505Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5149287Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5150422Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5150779Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5151478Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5152005Z [rank2]:E1204 09:47:32.598000 66161 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.5152438Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5152957Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5153930Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5154456Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5155414Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5155806Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5156847Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5157308Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5158348Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5158777Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5159635Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5160029Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5160915Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5161348Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5162825Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T09:59:13.5163188Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5163770Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5164797Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5165118Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5165753Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5166235Z [rank3]:E1204 09:47:32.599000 66162 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.5166633Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5167109Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5168026Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5168485Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5169356Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5169715Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5170602Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5171034Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5171886Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5172314Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5173168Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5173591Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5174450Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5174882Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5176438Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.5177001Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5177668Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5178825Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5179181Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5179907Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5180456Z [rank1]:E1204 09:47:32.599000 66160 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.5180554Z dist init r=2, world=4 2025-12-04T09:59:13.5180662Z dist init r=0, world=4 2025-12-04T09:59:13.5180757Z dist init r=3, world=4 2025-12-04T09:59:13.5180883Z dist init r=1, world=4 2025-12-04T09:59:13.5182044Z [rank0]:[W1204 09:47:32.610414954 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.5182142Z FAILED [10.4840s] [100%] 2025-12-04T09:59:13.5182148Z 2025-12-04T09:59:13.5182304Z =================================== FAILURES =================================== 2025-12-04T09:59:13.5182617Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda __ 2025-12-04T09:59:13.5182734Z Traceback (most recent call last): 2025-12-04T09:59:13.5183288Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.5183399Z self._join_processes(fn) 2025-12-04T09:59:13.5184016Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.5184159Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.5184761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.5184881Z raise RuntimeError(error) 2025-12-04T09:59:13.5185111Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.5185234Z Traceback (most recent call last): 2025-12-04T09:59:13.5185771Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5185909Z getattr(self, test_name)() 2025-12-04T09:59:13.5186447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5186536Z fn() 2025-12-04T09:59:13.5187044Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5187151Z method(*args, **kwargs) 2025-12-04T09:59:13.5187652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5187757Z method(*args, **kwargs) 2025-12-04T09:59:13.5188262Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5188390Z with policy(): 2025-12-04T09:59:13.5189009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5189114Z raise RuntimeError(msg) 2025-12-04T09:59:13.5190206Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912. 2025-12-04T09:59:13.5190218Z 2025-12-04T09:59:13.5190406Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5191022Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5191028Z 2025-12-04T09:59:13.5191266Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5191273Z 2025-12-04T09:59:13.5191277Z 2025-12-04T09:59:13.5191473Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.5191710Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.5192422Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cbd7e5f481e859be.xml - 2025-12-04T09:59:13.5192602Z =========================== short test summary info ============================ 2025-12-04T09:59:13.5193374Z FAILED [10.4840s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.5193478Z Traceback (most recent call last): 2025-12-04T09:59:13.5193974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5194072Z getattr(self, test_name)() 2025-12-04T09:59:13.5194541Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5194629Z fn() 2025-12-04T09:59:13.5195074Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5195209Z method(*args, **kwargs) 2025-12-04T09:59:13.5195654Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5195746Z method(*args, **kwargs) 2025-12-04T09:59:13.5196196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5196278Z with policy(): 2025-12-04T09:59:13.5196727Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5196828Z raise RuntimeError(msg) 2025-12-04T09:59:13.5197916Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912. 2025-12-04T09:59:13.5197946Z 2025-12-04T09:59:13.5198142Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5198764Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5198768Z 2025-12-04T09:59:13.5199003Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5199160Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.5199340Z ====================== 1 failed, 26 deselected in 10.70s ======================= 2025-12-04T09:59:13.5199426Z Got exit code 1 2025-12-04T09:59:13.5199971Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda 2025-12-04T09:59:13.5200331Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.5200889Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ede249f1a681285.xml 2025-12-04T09:59:13.5201029Z ============================= test session starts ============================== 2025-12-04T09:59:13.5201341Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.5201433Z cachedir: .pytest_cache 2025-12-04T09:59:13.5201884Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.5201997Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.5202089Z configfile: pytest.ini 2025-12-04T09:59:13.5202561Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.5202757Z collecting ... collected 60 items / 12 deselected / 48 selected 2025-12-04T09:59:13.5202880Z stepcurrent: skipping 12 already run items. 2025-12-04T09:59:13.5202986Z Running 15 items in this shard 2025-12-04T09:59:13.5202991Z 2025-12-04T09:59:13.5203962Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda I1204 09:47:39.283000 66444 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 66496 2025-12-04T09:59:13.5204406Z I1204 09:47:39.284000 66444 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 66497 2025-12-04T09:59:13.5204847Z I1204 09:47:39.285000 66444 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 66498 2025-12-04T09:59:13.5205280Z I1204 09:47:39.286000 66444 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 66499 2025-12-04T09:59:13.5206179Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5206289Z return wrapper_cls(module, **kwargs) 2025-12-04T09:59:13.5207155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5207260Z return wrapper_cls(module, **kwargs) 2025-12-04T09:59:13.5208110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5208225Z return wrapper_cls(module, **kwargs) 2025-12-04T09:59:13.5209074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5209214Z return wrapper_cls(module, **kwargs) 2025-12-04T09:59:13.5211006Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5211409Z _warn_cpu_init() 2025-12-04T09:59:13.5213208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5213298Z _warn_cpu_init() 2025-12-04T09:59:13.5215069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5215156Z _warn_cpu_init() 2025-12-04T09:59:13.5217277Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5217375Z _warn_cpu_init() 2025-12-04T09:59:13.5218378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5218638Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T09:59:13.5219623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5219887Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T09:59:13.5221153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5221427Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T09:59:13.5222411Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5222673Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T09:59:13.5223666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.5223814Z return func(*args, **kwargs) 2025-12-04T09:59:13.5224601Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.5224709Z return func(*args, **kwargs) 2025-12-04T09:59:13.5225481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.5225586Z return func(*args, **kwargs) 2025-12-04T09:59:13.5226342Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.5226497Z return func(*args, **kwargs) 2025-12-04T09:59:13.5227252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.5227366Z return func(*args, **kwargs) 2025-12-04T09:59:13.5228122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.5228225Z return func(*args, **kwargs) 2025-12-04T09:59:13.5228990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.5229093Z return func(*args, **kwargs) 2025-12-04T09:59:13.5229844Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.5229955Z return func(*args, **kwargs) 2025-12-04T09:59:13.5230710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.5230822Z return func(*args, **kwargs) 2025-12-04T09:59:13.5231317Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5231850Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5233060Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5233521Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5234406Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5234855Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5235710Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5236143Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5236993Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5237458Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5238311Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5238906Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5239816Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5241401Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5243615Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064. 2025-12-04T09:59:13.5245689Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5246772Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5248654Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5250218Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5251567Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5252964Z [rank0]:E1204 09:47:47.695000 66496 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.5254061Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5255141Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5257017Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5258663Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5260339Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5261861Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5263352Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5264990Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5271087Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5272618Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5274126Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5275610Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5277063Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5278483Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5280542Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T09:59:13.5282491Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5283513Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5285249Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5286748Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5287830Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5289058Z [rank2]:E1204 09:47:47.695000 66498 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.5290054Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5291042Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5292559Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5294013Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5295449Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5297105Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5298608Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5300238Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5301818Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5303394Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5304967Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5306537Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5308084Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5309700Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5311741Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 1. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T09:59:13.5313673Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5314697Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5316473Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5317942Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5319003Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5320244Z [rank1]:E1204 09:47:47.695000 66497 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.5321594Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5322790Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5324460Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5326093Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5327717Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5329235Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5330776Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5332360Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5333990Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5335396Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5337108Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5338660Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5340202Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5341789Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5344110Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 3. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.5346293Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5347502Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5349503Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5350974Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5352051Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5353290Z [rank3]:E1204 09:47:47.696000 66499 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.5353980Z dist init r=0, world=4 2025-12-04T09:59:13.5354263Z dist init r=3, world=4 2025-12-04T09:59:13.5354500Z dist init r=1, world=4 2025-12-04T09:59:13.5354737Z dist init r=2, world=4 2025-12-04T09:59:13.5355906Z [rank0]:[W1204 09:47:48.718992126 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.5357129Z FAILED [10.7200s] [ 6%] 2025-12-04T09:59:13.5357294Z 2025-12-04T09:59:13.5357422Z =================================== FAILURES =================================== 2025-12-04T09:59:13.5357965Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda _ 2025-12-04T09:59:13.5358465Z Traceback (most recent call last): 2025-12-04T09:59:13.5359178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.5359876Z self._join_processes(fn) 2025-12-04T09:59:13.5360575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.5361329Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.5362100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.5362859Z raise RuntimeError(error) 2025-12-04T09:59:13.5363243Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.5363699Z Traceback (most recent call last): 2025-12-04T09:59:13.5364383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5365243Z getattr(self, test_name)() 2025-12-04T09:59:13.5365934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5366653Z fn() 2025-12-04T09:59:13.5367248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5367937Z method(*args, **kwargs) 2025-12-04T09:59:13.5368595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5369290Z method(*args, **kwargs) 2025-12-04T09:59:13.5369942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5370625Z with policy(): 2025-12-04T09:59:13.5371259Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5371971Z raise RuntimeError(msg) 2025-12-04T09:59:13.5373546Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 1. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T09:59:13.5374875Z 2025-12-04T09:59:13.5375081Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5376098Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5377004Z 2025-12-04T09:59:13.5377443Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5377848Z 2025-12-04T09:59:13.5377853Z 2025-12-04T09:59:13.5378086Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.5378703Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.5379934Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ede249f1a681285.xml - 2025-12-04T09:59:13.5381042Z =========================== short test summary info ============================ 2025-12-04T09:59:13.5382227Z FAILED [10.7200s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.5383324Z Traceback (most recent call last): 2025-12-04T09:59:13.5384103Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5384905Z getattr(self, test_name)() 2025-12-04T09:59:13.5385649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5386431Z fn() 2025-12-04T09:59:13.5387062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5387807Z method(*args, **kwargs) 2025-12-04T09:59:13.5388605Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5389331Z method(*args, **kwargs) 2025-12-04T09:59:13.5390101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5390794Z with policy(): 2025-12-04T09:59:13.5391410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5392149Z raise RuntimeError(msg) 2025-12-04T09:59:13.5393496Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 112128 on device 1. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T09:59:13.5394783Z 2025-12-04T09:59:13.5394990Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5395975Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5396958Z 2025-12-04T09:59:13.5397215Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5397774Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.5398249Z ====================== 1 failed, 12 deselected in 10.94s ======================= 2025-12-04T09:59:13.5398638Z Got exit code 1 2025-12-04T09:59:13.5398880Z Retrying single test... 2025-12-04T09:59:13.5399655Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-11be05c94e086d26.xml 2025-12-04T09:59:13.5400540Z ============================= test session starts ============================== 2025-12-04T09:59:13.5401187Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.5401750Z cachedir: .pytest_cache 2025-12-04T09:59:13.5402541Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.5403257Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.5403577Z configfile: pytest.ini 2025-12-04T09:59:13.5404245Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.5405076Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.5406131Z stepcurrent: skipping 12 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5407091Z Running 1 items in this shard 2025-12-04T09:59:13.5407283Z 2025-12-04T09:59:13.5408438Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda I1204 09:47:54.704000 66781 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 66833 2025-12-04T09:59:13.5409937Z I1204 09:47:54.704000 66781 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 66834 2025-12-04T09:59:13.5410925Z I1204 09:47:54.705000 66781 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 66835 2025-12-04T09:59:13.5411921Z I1204 09:47:54.706000 66781 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 66836 2025-12-04T09:59:13.5413345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5414470Z return wrapper_cls(module, **kwargs) 2025-12-04T09:59:13.5416557Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5418988Z _warn_cpu_init() 2025-12-04T09:59:13.5420119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5421598Z return wrapper_cls(module, **kwargs) 2025-12-04T09:59:13.5422824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5424033Z return wrapper_cls(module, **kwargs) 2025-12-04T09:59:13.5426274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5428508Z _warn_cpu_init() 2025-12-04T09:59:13.5430749Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5433173Z _warn_cpu_init() 2025-12-04T09:59:13.5434189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5435409Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T09:59:13.5436882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.5438038Z return func(*args, **kwargs) 2025-12-04T09:59:13.5439186Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5440329Z return wrapper_cls(module, **kwargs) 2025-12-04T09:59:13.5441479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5442772Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T09:59:13.5444070Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5445394Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T09:59:13.5447852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5450012Z _warn_cpu_init() 2025-12-04T09:59:13.5451117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5452577Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T09:59:13.5453669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.5454616Z return func(*args, **kwargs) 2025-12-04T09:59:13.5455526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.5456540Z return func(*args, **kwargs) 2025-12-04T09:59:13.5457663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.5458670Z return func(*args, **kwargs) 2025-12-04T09:59:13.5459635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.5460633Z return func(*args, **kwargs) 2025-12-04T09:59:13.5461584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.5462629Z return func(*args, **kwargs) 2025-12-04T09:59:13.5463581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.5464576Z return func(*args, **kwargs) 2025-12-04T09:59:13.5465523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.5466520Z return func(*args, **kwargs) 2025-12-04T09:59:13.5467474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.5468471Z return func(*args, **kwargs) 2025-12-04T09:59:13.5469345Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5470348Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5471826Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5473282Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5474731Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5476113Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5477442Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5478856Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5480257Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5481679Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5483087Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5484458Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5485814Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5487219Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5489269Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T09:59:13.5491227Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5492252Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5493975Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5495448Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5496600Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5498197Z [rank0]:E1204 09:48:03.060000 66833 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.5499327Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5500436Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5502098Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5503744Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5505411Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5506923Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5508415Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5510078Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5511609Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5513007Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5514413Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5515770Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5517139Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5518547Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5520624Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 2. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T09:59:13.5523070Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5524221Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5526178Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5527895Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5529105Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5530486Z [rank2]:E1204 09:48:03.060000 66835 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.5531606Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5532719Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5534390Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5535882Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5537652Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5539169Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5540724Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5542306Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5543883Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5545455Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5547031Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5548567Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5550087Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5551491Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5553570Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 1. CUDA driver allocated memory was 598671360 and is now 651100160. 2025-12-04T09:59:13.5555506Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5556540Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5558297Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5559761Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5560828Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5562060Z [rank1]:E1204 09:48:03.062000 66834 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.5563057Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5564077Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5565552Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5567002Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5568443Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5569828Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5571157Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5572556Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5573961Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5575355Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5577017Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5578561Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5580141Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5581735Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5584058Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 3. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T09:59:13.5586237Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5587417Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5589524Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5590989Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5592062Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5593291Z [rank3]:E1204 09:48:03.062000 66836 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.5594008Z dist init r=1, world=4 2025-12-04T09:59:13.5594245Z dist init r=3, world=4 2025-12-04T09:59:13.5594475Z dist init r=0, world=4 2025-12-04T09:59:13.5594701Z dist init r=2, world=4 2025-12-04T09:59:13.5595880Z [rank0]:[W1204 09:48:03.083584524 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.5597098Z FAILED [10.0360s] [100%] 2025-12-04T09:59:13.5597258Z 2025-12-04T09:59:13.5597394Z =================================== FAILURES =================================== 2025-12-04T09:59:13.5597959Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda _ 2025-12-04T09:59:13.5598467Z Traceback (most recent call last): 2025-12-04T09:59:13.5599147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.5599843Z self._join_processes(fn) 2025-12-04T09:59:13.5600538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.5601294Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.5602061Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.5602812Z raise RuntimeError(error) 2025-12-04T09:59:13.5603191Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.5603612Z Traceback (most recent call last): 2025-12-04T09:59:13.5604296Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5604981Z getattr(self, test_name)() 2025-12-04T09:59:13.5605636Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5606309Z fn() 2025-12-04T09:59:13.5606878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5607560Z method(*args, **kwargs) 2025-12-04T09:59:13.5608178Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5608837Z method(*args, **kwargs) 2025-12-04T09:59:13.5609445Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5610098Z with policy(): 2025-12-04T09:59:13.5610695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5611366Z raise RuntimeError(msg) 2025-12-04T09:59:13.5612664Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T09:59:13.5613880Z 2025-12-04T09:59:13.5614068Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5614993Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5615735Z 2025-12-04T09:59:13.5615972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5616390Z 2025-12-04T09:59:13.5616395Z 2025-12-04T09:59:13.5616599Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.5617372Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.5618607Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-11be05c94e086d26.xml - 2025-12-04T09:59:13.5619704Z =========================== short test summary info ============================ 2025-12-04T09:59:13.5621081Z FAILED [10.0360s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.5622199Z Traceback (most recent call last): 2025-12-04T09:59:13.5622986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5623839Z getattr(self, test_name)() 2025-12-04T09:59:13.5624571Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5625323Z fn() 2025-12-04T09:59:13.5625956Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5626697Z method(*args, **kwargs) 2025-12-04T09:59:13.5627390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5628139Z method(*args, **kwargs) 2025-12-04T09:59:13.5628826Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5629556Z with policy(): 2025-12-04T09:59:13.5630221Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5630972Z raise RuntimeError(msg) 2025-12-04T09:59:13.5632406Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T09:59:13.5633870Z 2025-12-04T09:59:13.5634062Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5635030Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5635779Z 2025-12-04T09:59:13.5636011Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5636515Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.5636941Z ====================== 1 failed, 26 deselected in 10.25s ======================= 2025-12-04T09:59:13.5637304Z Got exit code 1 2025-12-04T09:59:13.5637530Z Retrying single test... 2025-12-04T09:59:13.5638234Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-16966e8ed8e62900.xml 2025-12-04T09:59:13.5639032Z ============================= test session starts ============================== 2025-12-04T09:59:13.5639632Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.5640150Z cachedir: .pytest_cache 2025-12-04T09:59:13.5640752Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.5641429Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.5641723Z configfile: pytest.ini 2025-12-04T09:59:13.5642358Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.5643137Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.5644146Z stepcurrent: skipping 12 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5645083Z Running 1 items in this shard 2025-12-04T09:59:13.5645267Z 2025-12-04T09:59:13.5646221Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda I1204 09:48:09.714000 67118 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 67170 2025-12-04T09:59:13.5647709Z I1204 09:48:09.715000 67118 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 67171 2025-12-04T09:59:13.5648702Z I1204 09:48:09.715000 67118 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 67172 2025-12-04T09:59:13.5649687Z I1204 09:48:09.716000 67118 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 67173 2025-12-04T09:59:13.5651150Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5652232Z return wrapper_cls(module, **kwargs) 2025-12-04T09:59:13.5653306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5654391Z return wrapper_cls(module, **kwargs) 2025-12-04T09:59:13.5655455Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5656606Z return wrapper_cls(module, **kwargs) 2025-12-04T09:59:13.5657978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:485: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5659196Z return wrapper_cls(module, **kwargs) 2025-12-04T09:59:13.5661514Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5663747Z _warn_cpu_init() 2025-12-04T09:59:13.5665899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5668126Z _warn_cpu_init() 2025-12-04T09:59:13.5670277Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5672260Z _warn_cpu_init() 2025-12-04T09:59:13.5674181Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5676278Z _warn_cpu_init() 2025-12-04T09:59:13.5677288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5678509Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T09:59:13.5679735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5680979Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T09:59:13.5682198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5683409Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T09:59:13.5684632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:532: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.5686061Z fsdp_model = FSDP(model, auto_wrap_policy=always_wrap_policy, **fsdp_kwargs) 2025-12-04T09:59:13.5687366Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.5688518Z return func(*args, **kwargs) 2025-12-04T09:59:13.5689434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.5690384Z return func(*args, **kwargs) 2025-12-04T09:59:13.5691324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.5692268Z return func(*args, **kwargs) 2025-12-04T09:59:13.5693171Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.5694308Z return func(*args, **kwargs) 2025-12-04T09:59:13.5695240Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.5696199Z return func(*args, **kwargs) 2025-12-04T09:59:13.5697375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.5698410Z return func(*args, **kwargs) 2025-12-04T09:59:13.5699377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.5700373Z return func(*args, **kwargs) 2025-12-04T09:59:13.5701329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.5702322Z return func(*args, **kwargs) 2025-12-04T09:59:13.5703279Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.5704296Z return func(*args, **kwargs) 2025-12-04T09:59:13.5704956Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5706076Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5707744Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5709464Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5711200Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5712776Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5714182Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5715673Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5717156Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5718645Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5720135Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5721991Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5723534Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5725121Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5727441Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 0. CUDA driver allocated memory was 707723264 and is now 760152064. 2025-12-04T09:59:13.5729658Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5730816Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5732767Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5734523Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5735599Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5737142Z [rank0]:E1204 09:48:18.008000 67170 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.5738282Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5739387Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5741053Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5742731Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5744361Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5745875Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5747359Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5749045Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5750454Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5751850Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5753286Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5754647Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5756010Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5757414Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5759497Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T09:59:13.5761430Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5762449Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5764190Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5765690Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5766763Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5767999Z [rank2]:E1204 09:48:18.008000 67172 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.5769001Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5769991Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5771500Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5772952Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5774388Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5775729Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5777376Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5778959Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5780548Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5782159Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5783743Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5785280Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5786821Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5788403Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5790663Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 1. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T09:59:13.5792601Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5793630Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5795357Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5796857Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5797926Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5799163Z [rank1]:E1204 09:48:18.009000 67171 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.5800193Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5801179Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5802658Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5804112Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5805549Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5806896Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5808223Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5809625Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5811051Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5812448Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5813855Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5815210Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5816850Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5818454Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5820934Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 3. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.5823154Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5824377Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5826322Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5827970Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5829184Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5830617Z [rank3]:E1204 09:48:18.009000 67173 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.5831393Z dist init r=2, world=4 2025-12-04T09:59:13.5831662Z dist init r=3, world=4 2025-12-04T09:59:13.5831924Z dist init r=1, world=4 2025-12-04T09:59:13.5832188Z dist init r=0, world=4 2025-12-04T09:59:13.5833584Z [rank0]:[W1204 09:48:18.075801567 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.5834895Z FAILED [10.0962s] [100%] 2025-12-04T09:59:13.5835065Z 2025-12-04T09:59:13.5835210Z =================================== FAILURES =================================== 2025-12-04T09:59:13.5835787Z _ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda _ 2025-12-04T09:59:13.5836327Z Traceback (most recent call last): 2025-12-04T09:59:13.5837048Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.5837789Z self._join_processes(fn) 2025-12-04T09:59:13.5838520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.5839330Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.5840376Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.5841201Z raise RuntimeError(error) 2025-12-04T09:59:13.5841720Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.5842168Z Traceback (most recent call last): 2025-12-04T09:59:13.5842888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5843628Z getattr(self, test_name)() 2025-12-04T09:59:13.5844324Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5845042Z fn() 2025-12-04T09:59:13.5845639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5846371Z method(*args, **kwargs) 2025-12-04T09:59:13.5847033Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5847735Z method(*args, **kwargs) 2025-12-04T09:59:13.5848383Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5849259Z with policy(): 2025-12-04T09:59:13.5849905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5850672Z raise RuntimeError(msg) 2025-12-04T09:59:13.5852272Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T09:59:13.5852313Z 2025-12-04T09:59:13.5852518Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5853197Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5853202Z 2025-12-04T09:59:13.5853447Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5853452Z 2025-12-04T09:59:13.5853456Z 2025-12-04T09:59:13.5853673Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.5853962Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.5854725Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-16966e8ed8e62900.xml - 2025-12-04T09:59:13.5854887Z =========================== short test summary info ============================ 2025-12-04T09:59:13.5855889Z FAILED [10.0962s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.5856014Z Traceback (most recent call last): 2025-12-04T09:59:13.5856630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5856740Z getattr(self, test_name)() 2025-12-04T09:59:13.5857447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5857542Z fn() 2025-12-04T09:59:13.5858050Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5858153Z method(*args, **kwargs) 2025-12-04T09:59:13.5858656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5858763Z method(*args, **kwargs) 2025-12-04T09:59:13.5859299Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5859399Z with policy(): 2025-12-04T09:59:13.5859905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5860009Z raise RuntimeError(msg) 2025-12-04T09:59:13.5861258Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 108032 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T09:59:13.5861269Z 2025-12-04T09:59:13.5861483Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5862230Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5862236Z 2025-12-04T09:59:13.5862496Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5862674Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.5862854Z ====================== 1 failed, 26 deselected in 10.31s ======================= 2025-12-04T09:59:13.5862947Z Got exit code 1 2025-12-04T09:59:13.5863589Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda 2025-12-04T09:59:13.5863991Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.5864644Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90420efea6f00dc5.xml 2025-12-04T09:59:13.5864810Z ============================= test session starts ============================== 2025-12-04T09:59:13.5865160Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.5865263Z cachedir: .pytest_cache 2025-12-04T09:59:13.5865784Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.5865902Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.5866013Z configfile: pytest.ini 2025-12-04T09:59:13.5866575Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.5866789Z collecting ... collected 60 items / 13 deselected / 47 selected 2025-12-04T09:59:13.5866939Z stepcurrent: skipping 13 already run items. 2025-12-04T09:59:13.5867047Z Running 14 items in this shard 2025-12-04T09:59:13.5867052Z 2025-12-04T09:59:13.5868123Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda I1204 09:48:24.703000 67455 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 67507 2025-12-04T09:59:13.5868620Z I1204 09:48:24.704000 67455 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 67508 2025-12-04T09:59:13.5869221Z I1204 09:48:24.705000 67455 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 67509 2025-12-04T09:59:13.5869823Z I1204 09:48:24.706000 67455 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 67510 2025-12-04T09:59:13.5871796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5871900Z _warn_cpu_init() 2025-12-04T09:59:13.5874007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5874130Z _warn_cpu_init() 2025-12-04T09:59:13.5876155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5876259Z _warn_cpu_init() 2025-12-04T09:59:13.5878202Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5878328Z _warn_cpu_init() 2025-12-04T09:59:13.5879307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.5879416Z return func(*args, **kwargs) 2025-12-04T09:59:13.5879878Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5880395Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5881373Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5881895Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5882859Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5883258Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5884189Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5884664Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5885593Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5886071Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5887018Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5887451Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5888388Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5888863Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5890523Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 1. CUDA driver allocated memory was 602865664 and is now 625934336. 2025-12-04T09:59:13.5890876Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5891522Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5892631Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.5893432Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5894133Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5894660Z [rank1]:E1204 09:48:33.740000 67508 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.5895099Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5895614Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5896706Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5897393Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5898380Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5898778Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5899763Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5900257Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5901220Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5901752Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5902709Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5903152Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5904113Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5904628Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5906306Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 0. CUDA driver allocated memory was 709820416 and is now 734986240. 2025-12-04T09:59:13.5906671Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5907337Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5908515Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.5908986Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5909687Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5910210Z [rank0]:E1204 09:48:33.744000 67507 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.5910674Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5911182Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5912159Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5912650Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5913609Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5913998Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5915039Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5915508Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5916439Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5916996Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5917849Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5918247Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5919143Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5919583Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5921387Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T09:59:13.5921760Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5922487Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5923633Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.5923990Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5924715Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5925335Z [rank2]:E1204 09:48:33.744000 67509 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.5925790Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5926323Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5927321Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5927838Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5928825Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5929231Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5930240Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5930738Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5931694Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5932183Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5933146Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5933732Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5934597Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5935028Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5936589Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T09:59:13.5937164Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5937836Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5938980Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.5939340Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5940092Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5940641Z [rank3]:E1204 09:48:33.745000 67510 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.5940753Z dist init r=1, world=4 2025-12-04T09:59:13.5940855Z dist init r=2, world=4 2025-12-04T09:59:13.5940950Z dist init r=3, world=4 2025-12-04T09:59:13.5941055Z dist init r=0, world=4 2025-12-04T09:59:13.5942215Z [rank0]:[W1204 09:48:34.770306676 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.5942324Z FAILED [11.0968s] [ 7%] 2025-12-04T09:59:13.5942333Z 2025-12-04T09:59:13.5942481Z =================================== FAILURES =================================== 2025-12-04T09:59:13.5942791Z __ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda __ 2025-12-04T09:59:13.5942922Z Traceback (most recent call last): 2025-12-04T09:59:13.5943471Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.5943582Z self._join_processes(fn) 2025-12-04T09:59:13.5944211Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.5944352Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.5944963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.5945077Z raise RuntimeError(error) 2025-12-04T09:59:13.5945313Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.5945438Z Traceback (most recent call last): 2025-12-04T09:59:13.5945978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5946091Z getattr(self, test_name)() 2025-12-04T09:59:13.5946665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5946751Z fn() 2025-12-04T09:59:13.5947269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5947370Z method(*args, **kwargs) 2025-12-04T09:59:13.5947873Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5947987Z method(*args, **kwargs) 2025-12-04T09:59:13.5948489Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5948698Z with policy(): 2025-12-04T09:59:13.5949315Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5949451Z raise RuntimeError(msg) 2025-12-04T09:59:13.5950619Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 1. CUDA driver allocated memory was 602865664 and is now 625934336. 2025-12-04T09:59:13.5950625Z 2025-12-04T09:59:13.5950824Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5951488Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.5951520Z 2025-12-04T09:59:13.5951768Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5951773Z 2025-12-04T09:59:13.5951778Z 2025-12-04T09:59:13.5951980Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.5952235Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.5952994Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90420efea6f00dc5.xml - 2025-12-04T09:59:13.5953168Z =========================== short test summary info ============================ 2025-12-04T09:59:13.5954154Z FAILED [11.0968s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.5954272Z Traceback (most recent call last): 2025-12-04T09:59:13.5954810Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5954920Z getattr(self, test_name)() 2025-12-04T09:59:13.5955449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5955535Z fn() 2025-12-04T09:59:13.5956023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5956131Z method(*args, **kwargs) 2025-12-04T09:59:13.5956650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5956749Z method(*args, **kwargs) 2025-12-04T09:59:13.5957243Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5957336Z with policy(): 2025-12-04T09:59:13.5957838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5957940Z raise RuntimeError(msg) 2025-12-04T09:59:13.5959180Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 1. CUDA driver allocated memory was 602865664 and is now 625934336. 2025-12-04T09:59:13.5959189Z 2025-12-04T09:59:13.5959407Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5960100Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.5960106Z 2025-12-04T09:59:13.5960367Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5960540Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.5960714Z ====================== 1 failed, 13 deselected in 11.31s ======================= 2025-12-04T09:59:13.5960812Z Got exit code 1 2025-12-04T09:59:13.5960912Z Retrying single test... 2025-12-04T09:59:13.5961557Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6c9f36ab2b8b15ae.xml 2025-12-04T09:59:13.5961711Z ============================= test session starts ============================== 2025-12-04T09:59:13.5962049Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.5962160Z cachedir: .pytest_cache 2025-12-04T09:59:13.5962657Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.5962770Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.5962880Z configfile: pytest.ini 2025-12-04T09:59:13.5963397Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.5963638Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.5964393Z stepcurrent: skipping 13 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.5964503Z Running 1 items in this shard 2025-12-04T09:59:13.5964511Z 2025-12-04T09:59:13.5965536Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda I1204 09:48:40.504000 67792 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 67844 2025-12-04T09:59:13.5966017Z I1204 09:48:40.505000 67792 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 67845 2025-12-04T09:59:13.5966510Z I1204 09:48:40.505000 67792 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 67846 2025-12-04T09:59:13.5966989Z I1204 09:48:40.506000 67792 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 67847 2025-12-04T09:59:13.5969103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5969204Z _warn_cpu_init() 2025-12-04T09:59:13.5971102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5971199Z _warn_cpu_init() 2025-12-04T09:59:13.5972352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.5972473Z return func(*args, **kwargs) 2025-12-04T09:59:13.5974418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5974526Z _warn_cpu_init() 2025-12-04T09:59:13.5976556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.5976889Z _warn_cpu_init() 2025-12-04T09:59:13.5977354Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5977889Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5978933Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5979444Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5980450Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5980854Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5981831Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5982318Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5983280Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5983808Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5984769Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.5985229Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.5986197Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.5986697Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.5988478Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 0. CUDA driver allocated memory was 716111872 and is now 734986240. 2025-12-04T09:59:13.5988948Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5989575Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.5990655Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.5991039Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.5991716Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.5992238Z [rank0]:E1204 09:48:49.505000 67844 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.5992659Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.5993189Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.5994137Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.5994616Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.5995553Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.5995926Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.5996836Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5997297Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5998232Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.5998697Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.5999603Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6000028Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6000932Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6001429Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6003010Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 1. CUDA driver allocated memory was 611254272 and is now 625934336. 2025-12-04T09:59:13.6003360Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6003974Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6005087Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.6005433Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6006108Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6006631Z [rank1]:E1204 09:48:49.506000 67845 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.6007081Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6007583Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6008735Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6009225Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6010184Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6010574Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6011514Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6012021Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6012957Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6013442Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6014367Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6014810Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6015774Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6016256Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6018147Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T09:59:13.6018566Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6019223Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6020368Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.6020925Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6021740Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6022295Z [rank3]:E1204 09:48:49.508000 67847 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.6022748Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6023282Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6024298Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6024805Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6025809Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6026209Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6027220Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6027707Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6028661Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6029160Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6030170Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6030626Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6031598Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6032096Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6033828Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T09:59:13.6034226Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6034844Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6035920Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.6036297Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6036969Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6037494Z [rank2]:E1204 09:48:49.508000 67846 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.6037593Z dist init r=1, world=4 2025-12-04T09:59:13.6037685Z dist init r=3, world=4 2025-12-04T09:59:13.6037786Z dist init r=0, world=4 2025-12-04T09:59:13.6037879Z dist init r=2, world=4 2025-12-04T09:59:13.6038972Z [rank0]:[W1204 09:48:49.524393723 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.6039068Z FAILED [11.1211s] [100%] 2025-12-04T09:59:13.6039074Z 2025-12-04T09:59:13.6039209Z =================================== FAILURES =================================== 2025-12-04T09:59:13.6039512Z __ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda __ 2025-12-04T09:59:13.6039624Z Traceback (most recent call last): 2025-12-04T09:59:13.6040140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.6040280Z self._join_processes(fn) 2025-12-04T09:59:13.6041010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.6041160Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.6041748Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.6041862Z raise RuntimeError(error) 2025-12-04T09:59:13.6042093Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.6042208Z Traceback (most recent call last): 2025-12-04T09:59:13.6042744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6042853Z getattr(self, test_name)() 2025-12-04T09:59:13.6043400Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6043496Z fn() 2025-12-04T09:59:13.6044013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6044115Z method(*args, **kwargs) 2025-12-04T09:59:13.6044611Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6044713Z method(*args, **kwargs) 2025-12-04T09:59:13.6045204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6045296Z with policy(): 2025-12-04T09:59:13.6045820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6045934Z raise RuntimeError(msg) 2025-12-04T09:59:13.6047216Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T09:59:13.6047224Z 2025-12-04T09:59:13.6047434Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6048104Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.6048138Z 2025-12-04T09:59:13.6048386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6048392Z 2025-12-04T09:59:13.6048407Z 2025-12-04T09:59:13.6048610Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.6048857Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.6049621Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6c9f36ab2b8b15ae.xml - 2025-12-04T09:59:13.6049779Z =========================== short test summary info ============================ 2025-12-04T09:59:13.6050578Z FAILED [11.1211s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.6050700Z Traceback (most recent call last): 2025-12-04T09:59:13.6051219Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6051329Z getattr(self, test_name)() 2025-12-04T09:59:13.6051835Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6051921Z fn() 2025-12-04T09:59:13.6052434Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6052535Z method(*args, **kwargs) 2025-12-04T09:59:13.6053021Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6053121Z method(*args, **kwargs) 2025-12-04T09:59:13.6053598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6053700Z with policy(): 2025-12-04T09:59:13.6054177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6054278Z raise RuntimeError(msg) 2025-12-04T09:59:13.6055469Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T09:59:13.6055475Z 2025-12-04T09:59:13.6055674Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6056426Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.6056433Z 2025-12-04T09:59:13.6056856Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6057058Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.6057236Z ====================== 1 failed, 26 deselected in 11.34s ======================= 2025-12-04T09:59:13.6057371Z Got exit code 1 2025-12-04T09:59:13.6057489Z Retrying single test... 2025-12-04T09:59:13.6058120Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d4c1fd96adc2be7.xml 2025-12-04T09:59:13.6058279Z ============================= test session starts ============================== 2025-12-04T09:59:13.6058637Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.6058741Z cachedir: .pytest_cache 2025-12-04T09:59:13.6059260Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.6059380Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.6059514Z configfile: pytest.ini 2025-12-04T09:59:13.6060054Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.6060266Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.6061048Z stepcurrent: skipping 13 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.6061169Z Running 1 items in this shard 2025-12-04T09:59:13.6061177Z 2025-12-04T09:59:13.6062229Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda I1204 09:48:56.294000 68129 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 68181 2025-12-04T09:59:13.6062734Z I1204 09:48:56.295000 68129 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 68182 2025-12-04T09:59:13.6063231Z I1204 09:48:56.295000 68129 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 68183 2025-12-04T09:59:13.6063730Z I1204 09:48:56.296000 68129 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 68184 2025-12-04T09:59:13.6065802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6065908Z _warn_cpu_init() 2025-12-04T09:59:13.6067958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6068062Z _warn_cpu_init() 2025-12-04T09:59:13.6070141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6070232Z _warn_cpu_init() 2025-12-04T09:59:13.6072135Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6072253Z _warn_cpu_init() 2025-12-04T09:59:13.6073198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.6073302Z return func(*args, **kwargs) 2025-12-04T09:59:13.6073734Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6074239Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6075203Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6075693Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6076623Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6077004Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6077907Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6078365Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6079467Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6079986Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6080923Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6081356Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6082318Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6082824Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6084466Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T09:59:13.6084826Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6085464Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6086591Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.6086974Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6087679Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6088207Z [rank0]:E1204 09:49:05.162000 68181 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.6088671Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6089194Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6090277Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6090763Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6091694Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6092081Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6092988Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6093452Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6094395Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6094852Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6095760Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6096181Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6097406Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6097907Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6099603Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 2. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T09:59:13.6099969Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6100657Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6101816Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.6102179Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6102909Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6103480Z [rank2]:E1204 09:49:05.165000 68183 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.6103952Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6104486Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6105493Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6106014Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6106998Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6107413Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6108379Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6109008Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6109922Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6110381Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6111296Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6111738Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6112675Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6113142Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6114726Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T09:59:13.6115098Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6115724Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6116811Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.6117155Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6117869Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6118385Z [rank1]:E1204 09:49:05.166000 68182 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.6118821Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6119324Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6120258Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6120877Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6122041Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6122454Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6123481Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6123983Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6124943Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6125428Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6126442Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6126892Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6127858Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6128345Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6130043Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 15872 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T09:59:13.6130447Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6131107Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6132262Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.6132677Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6133403Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6134048Z [rank3]:E1204 09:49:05.166000 68184 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.6134151Z dist init r=0, world=4 2025-12-04T09:59:13.6134245Z dist init r=1, world=4 2025-12-04T09:59:13.6134334Z dist init r=3, world=4 2025-12-04T09:59:13.6134434Z dist init r=2, world=4 2025-12-04T09:59:13.6135522Z [rank0]:[W1204 09:49:05.176575241 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.6135621Z FAILED [11.2195s] [100%] 2025-12-04T09:59:13.6135638Z 2025-12-04T09:59:13.6135780Z =================================== FAILURES =================================== 2025-12-04T09:59:13.6136071Z __ TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda __ 2025-12-04T09:59:13.6136193Z Traceback (most recent call last): 2025-12-04T09:59:13.6136992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.6137110Z self._join_processes(fn) 2025-12-04T09:59:13.6137708Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.6137849Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.6138462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.6138575Z raise RuntimeError(error) 2025-12-04T09:59:13.6138807Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.6138935Z Traceback (most recent call last): 2025-12-04T09:59:13.6139508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6139617Z getattr(self, test_name)() 2025-12-04T09:59:13.6140164Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6140253Z fn() 2025-12-04T09:59:13.6140765Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6140869Z method(*args, **kwargs) 2025-12-04T09:59:13.6141373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6141491Z method(*args, **kwargs) 2025-12-04T09:59:13.6141989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6142115Z with policy(): 2025-12-04T09:59:13.6142635Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6142752Z raise RuntimeError(msg) 2025-12-04T09:59:13.6143977Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T09:59:13.6143983Z 2025-12-04T09:59:13.6144199Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6144935Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.6144940Z 2025-12-04T09:59:13.6145205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6145212Z 2025-12-04T09:59:13.6145380Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.6145510Z Traceback (most recent call last): 2025-12-04T09:59:13.6146062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6146172Z getattr(self, test_name)() 2025-12-04T09:59:13.6146712Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6146799Z fn() 2025-12-04T09:59:13.6147317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6147424Z method(*args, **kwargs) 2025-12-04T09:59:13.6147930Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6148041Z method(*args, **kwargs) 2025-12-04T09:59:13.6148543Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6148648Z with policy(): 2025-12-04T09:59:13.6149290Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6149392Z raise RuntimeError(msg) 2025-12-04T09:59:13.6150551Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T09:59:13.6150558Z 2025-12-04T09:59:13.6150760Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6151426Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.6151433Z 2025-12-04T09:59:13.6151681Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6151685Z 2025-12-04T09:59:13.6151717Z 2025-12-04T09:59:13.6151927Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.6152178Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.6152935Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d4c1fd96adc2be7.xml - 2025-12-04T09:59:13.6153105Z =========================== short test summary info ============================ 2025-12-04T09:59:13.6153906Z FAILED [11.2195s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.6154048Z Traceback (most recent call last): 2025-12-04T09:59:13.6154568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6154673Z getattr(self, test_name)() 2025-12-04T09:59:13.6155192Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6155277Z fn() 2025-12-04T09:59:13.6155758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6155866Z method(*args, **kwargs) 2025-12-04T09:59:13.6156343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6156481Z method(*args, **kwargs) 2025-12-04T09:59:13.6156963Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6157055Z with policy(): 2025-12-04T09:59:13.6157544Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6157650Z raise RuntimeError(msg) 2025-12-04T09:59:13.6158805Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 0. CUDA driver allocated memory was 720306176 and is now 734986240. 2025-12-04T09:59:13.6158819Z 2025-12-04T09:59:13.6159020Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6159667Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.6159674Z 2025-12-04T09:59:13.6159935Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6159941Z 2025-12-04T09:59:13.6160090Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.6160213Z Traceback (most recent call last): 2025-12-04T09:59:13.6160758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6160863Z getattr(self, test_name)() 2025-12-04T09:59:13.6161375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6161462Z fn() 2025-12-04T09:59:13.6161945Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6162056Z method(*args, **kwargs) 2025-12-04T09:59:13.6162533Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6162641Z method(*args, **kwargs) 2025-12-04T09:59:13.6163114Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6163205Z with policy(): 2025-12-04T09:59:13.6163719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6163826Z raise RuntimeError(msg) 2025-12-04T09:59:13.6164963Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 19968 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T09:59:13.6164979Z 2025-12-04T09:59:13.6165180Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6165828Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.6165860Z 2025-12-04T09:59:13.6166119Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6166288Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.6166467Z ====================== 1 failed, 26 deselected in 11.44s ======================= 2025-12-04T09:59:13.6166556Z Got exit code 1 2025-12-04T09:59:13.6167130Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda 2025-12-04T09:59:13.6167519Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.6168100Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-500277f28031837e.xml 2025-12-04T09:59:13.6168290Z ============================= test session starts ============================== 2025-12-04T09:59:13.6168621Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.6168721Z cachedir: .pytest_cache 2025-12-04T09:59:13.6169211Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.6169326Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.6169425Z configfile: pytest.ini 2025-12-04T09:59:13.6169933Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.6170137Z collecting ... collected 60 items / 14 deselected / 46 selected 2025-12-04T09:59:13.6170277Z stepcurrent: skipping 14 already run items. 2025-12-04T09:59:13.6170447Z Running 13 items in this shard 2025-12-04T09:59:13.6170452Z 2025-12-04T09:59:13.6171507Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda I1204 09:49:12.004000 68466 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 68518 2025-12-04T09:59:13.6181043Z I1204 09:49:12.005000 68466 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 68519 2025-12-04T09:59:13.6181737Z I1204 09:49:12.005000 68466 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 68520 2025-12-04T09:59:13.6182245Z I1204 09:49:12.006000 68466 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 68521 2025-12-04T09:59:13.6183254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6183399Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6184383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6184518Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6185547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6185679Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6186673Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6186805Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6188851Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6189176Z _warn_cpu_init() 2025-12-04T09:59:13.6191078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6191204Z _warn_cpu_init() 2025-12-04T09:59:13.6193093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6193192Z _warn_cpu_init() 2025-12-04T09:59:13.6195078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6195180Z _warn_cpu_init() 2025-12-04T09:59:13.6196117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6196334Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6197296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6197505Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6198444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6198650Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6199627Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6199834Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6204089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.6204489Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.6208715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.6209121Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.6213371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.6213748Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.6218404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.6218807Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.6219589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6219743Z return func(*args, **kwargs) 2025-12-04T09:59:13.6220522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6220645Z return func(*args, **kwargs) 2025-12-04T09:59:13.6221620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6221733Z return func(*args, **kwargs) 2025-12-04T09:59:13.6222505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6222702Z return func(*args, **kwargs) 2025-12-04T09:59:13.6223463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6223573Z return func(*args, **kwargs) 2025-12-04T09:59:13.6224332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6224447Z return func(*args, **kwargs) 2025-12-04T09:59:13.6225198Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6225312Z return func(*args, **kwargs) 2025-12-04T09:59:13.6226067Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6226175Z return func(*args, **kwargs) 2025-12-04T09:59:13.6227188Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.6227300Z return func(*args, **kwargs) 2025-12-04T09:59:13.6227814Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6228354Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6229358Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6229880Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6230905Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6231318Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6232282Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6232788Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6233819Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6234322Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6235237Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6235657Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6236573Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6237061Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6238663Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736. 2025-12-04T09:59:13.6239006Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6239623Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6240719Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6241062Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6241776Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6242290Z [rank0]:E1204 09:49:19.709000 68518 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.6242724Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6243220Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6244165Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6244680Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6245800Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6246198Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6247123Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6247600Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6248590Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6249061Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6249992Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6250424Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6251399Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6251878Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6253524Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 2. CUDA driver allocated memory was 609157120 and is now 674168832. 2025-12-04T09:59:13.6253877Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6254519Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6255647Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6256029Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6256983Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6257534Z [rank2]:E1204 09:49:19.710000 68520 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.6258000Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6258529Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6259968Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6260490Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6261476Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6261888Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6262854Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6263374Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6264342Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6264832Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6265800Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6266278Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6267260Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6267754Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6269623Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 3. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T09:59:13.6269951Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6270534Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6271592Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6271916Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6272558Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6273042Z [rank3]:E1204 09:49:19.711000 68521 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.6273453Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6273951Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6274850Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6275307Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6276178Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6276541Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6277434Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6277867Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6278728Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6279189Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6280054Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6280460Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6281329Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6281764Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6283274Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 1. CUDA driver allocated memory was 604962816 and is now 674168832. 2025-12-04T09:59:13.6283599Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6284210Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6285235Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6285559Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6286204Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6286690Z [rank1]:E1204 09:49:19.712000 68519 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.6286810Z dist init r=0, world=4 2025-12-04T09:59:13.6286912Z dist init r=3, world=4 2025-12-04T09:59:13.6287000Z dist init r=1, world=4 2025-12-04T09:59:13.6287094Z dist init r=2, world=4 2025-12-04T09:59:13.6288121Z [rank0]:[W1204 09:49:20.726194359 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.6288210Z FAILED [9.6776s] [ 7%] 2025-12-04T09:59:13.6288219Z 2025-12-04T09:59:13.6288363Z =================================== FAILURES =================================== 2025-12-04T09:59:13.6288643Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda __ 2025-12-04T09:59:13.6288786Z Traceback (most recent call last): 2025-12-04T09:59:13.6289277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.6289376Z self._join_processes(fn) 2025-12-04T09:59:13.6289910Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.6290038Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.6290577Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.6290685Z raise RuntimeError(error) 2025-12-04T09:59:13.6290893Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.6291037Z Traceback (most recent call last): 2025-12-04T09:59:13.6291524Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6291626Z getattr(self, test_name)() 2025-12-04T09:59:13.6292113Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6292199Z fn() 2025-12-04T09:59:13.6292649Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6292745Z method(*args, **kwargs) 2025-12-04T09:59:13.6293200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6293287Z method(*args, **kwargs) 2025-12-04T09:59:13.6293743Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6293830Z with policy(): 2025-12-04T09:59:13.6294279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6294385Z raise RuntimeError(msg) 2025-12-04T09:59:13.6295506Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736. 2025-12-04T09:59:13.6295514Z 2025-12-04T09:59:13.6295716Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6296416Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6296424Z 2025-12-04T09:59:13.6296662Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6296675Z 2025-12-04T09:59:13.6296847Z 2025-12-04T09:59:13.6297076Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.6297343Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.6298187Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-500277f28031837e.xml - 2025-12-04T09:59:13.6298361Z =========================== short test summary info ============================ 2025-12-04T09:59:13.6299261Z FAILED [9.6776s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.6299382Z Traceback (most recent call last): 2025-12-04T09:59:13.6299935Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6300056Z getattr(self, test_name)() 2025-12-04T09:59:13.6300590Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6300709Z fn() 2025-12-04T09:59:13.6301222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6301330Z method(*args, **kwargs) 2025-12-04T09:59:13.6301848Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6301953Z method(*args, **kwargs) 2025-12-04T09:59:13.6302458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6302564Z with policy(): 2025-12-04T09:59:13.6303070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6303210Z raise RuntimeError(msg) 2025-12-04T09:59:13.6304445Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736. 2025-12-04T09:59:13.6304456Z 2025-12-04T09:59:13.6304670Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6305376Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6305382Z 2025-12-04T09:59:13.6305647Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6305834Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.6306010Z ======================= 1 failed, 14 deselected in 9.89s ======================= 2025-12-04T09:59:13.6306108Z Got exit code 1 2025-12-04T09:59:13.6306222Z Retrying single test... 2025-12-04T09:59:13.6306844Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-942d56c07e16c88d.xml 2025-12-04T09:59:13.6307004Z ============================= test session starts ============================== 2025-12-04T09:59:13.6307389Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.6307498Z cachedir: .pytest_cache 2025-12-04T09:59:13.6308025Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.6308142Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.6308247Z configfile: pytest.ini 2025-12-04T09:59:13.6308894Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.6309217Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.6309921Z stepcurrent: skipping 14 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6310020Z Running 1 items in this shard 2025-12-04T09:59:13.6310054Z 2025-12-04T09:59:13.6310997Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda I1204 09:49:26.494000 68803 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 68855 2025-12-04T09:59:13.6311446Z I1204 09:49:26.495000 68803 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 68856 2025-12-04T09:59:13.6311882Z I1204 09:49:26.495000 68803 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 68857 2025-12-04T09:59:13.6312324Z I1204 09:49:26.496000 68803 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 68858 2025-12-04T09:59:13.6313432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6313592Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6314535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6314658Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6315595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6315745Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6316668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6316802Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6318717Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6318816Z _warn_cpu_init() 2025-12-04T09:59:13.6320709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6320982Z _warn_cpu_init() 2025-12-04T09:59:13.6323229Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6323349Z _warn_cpu_init() 2025-12-04T09:59:13.6325405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6325514Z _warn_cpu_init() 2025-12-04T09:59:13.6326510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6326730Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6327731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6327991Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6328999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6329220Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6330207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6330446Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6335164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.6335561Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.6340413Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.6340815Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.6345578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.6346003Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.6350556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.6350979Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.6351733Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6351851Z return func(*args, **kwargs) 2025-12-04T09:59:13.6352596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6352707Z return func(*args, **kwargs) 2025-12-04T09:59:13.6353794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6353899Z return func(*args, **kwargs) 2025-12-04T09:59:13.6354591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6354716Z return func(*args, **kwargs) 2025-12-04T09:59:13.6355386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6355488Z return func(*args, **kwargs) 2025-12-04T09:59:13.6356162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6356266Z return func(*args, **kwargs) 2025-12-04T09:59:13.6356937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6357036Z return func(*args, **kwargs) 2025-12-04T09:59:13.6357745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6357843Z return func(*args, **kwargs) 2025-12-04T09:59:13.6358737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.6358830Z return func(*args, **kwargs) 2025-12-04T09:59:13.6359241Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6359731Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6360658Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6361127Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6362004Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6362359Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6363255Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6363690Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6364554Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6364984Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6365844Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6366243Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6367101Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6367568Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6369299Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 1. CUDA driver allocated memory was 611254272 and is now 674168832. 2025-12-04T09:59:13.6369660Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6370278Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6371405Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6371748Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6372419Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6372937Z [rank1]:E1204 09:49:34.128000 68856 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.6373364Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6373906Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6374854Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6375340Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6376535Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6377146Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6378120Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6378612Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6379583Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6380075Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6381046Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6381504Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6382515Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6383015Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6384687Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 718209024 and is now 783220736. 2025-12-04T09:59:13.6385064Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6385751Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6386912Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6387277Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6387990Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6388580Z [rank0]:E1204 09:49:34.128000 68855 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.6389133Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6389619Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6390505Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6390962Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6391868Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6392223Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6393315Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6393780Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6394885Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6395367Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6396295Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6396768Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6397729Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6398212Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6399868Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 2. CUDA driver allocated memory was 604962816 and is now 674168832. 2025-12-04T09:59:13.6400224Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6400868Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6401978Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6402340Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6403064Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6403601Z [rank2]:E1204 09:49:34.128000 68857 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.6404039Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6404549Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6405529Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6406054Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6407026Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6407413Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6408453Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6408926Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6409830Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6410407Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6411289Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6411695Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6412545Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6412981Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6414571Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 3. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T09:59:13.6414893Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6415689Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6417046Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6417619Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6418347Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6418900Z [rank3]:E1204 09:49:34.130000 68858 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.6419004Z dist init r=0, world=4 2025-12-04T09:59:13.6419103Z dist init r=2, world=4 2025-12-04T09:59:13.6419207Z dist init r=3, world=4 2025-12-04T09:59:13.6419338Z dist init r=1, world=4 2025-12-04T09:59:13.6420490Z [rank0]:[W1204 09:49:34.147198041 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.6420600Z FAILED [9.3682s] [100%] 2025-12-04T09:59:13.6420606Z 2025-12-04T09:59:13.6420953Z =================================== FAILURES =================================== 2025-12-04T09:59:13.6421314Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda __ 2025-12-04T09:59:13.6421441Z Traceback (most recent call last): 2025-12-04T09:59:13.6421991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.6422111Z self._join_processes(fn) 2025-12-04T09:59:13.6422698Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.6422840Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.6423463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.6423573Z raise RuntimeError(error) 2025-12-04T09:59:13.6423814Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.6423933Z Traceback (most recent call last): 2025-12-04T09:59:13.6424539Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6424662Z getattr(self, test_name)() 2025-12-04T09:59:13.6425196Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6425292Z fn() 2025-12-04T09:59:13.6425797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6425900Z method(*args, **kwargs) 2025-12-04T09:59:13.6426433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6426539Z method(*args, **kwargs) 2025-12-04T09:59:13.6427080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6427187Z with policy(): 2025-12-04T09:59:13.6427697Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6427812Z raise RuntimeError(msg) 2025-12-04T09:59:13.6429039Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 2. CUDA driver allocated memory was 604962816 and is now 674168832. 2025-12-04T09:59:13.6429048Z 2025-12-04T09:59:13.6429260Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6429973Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6430016Z 2025-12-04T09:59:13.6430282Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6430288Z 2025-12-04T09:59:13.6430293Z 2025-12-04T09:59:13.6430519Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.6430778Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.6431591Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-942d56c07e16c88d.xml - 2025-12-04T09:59:13.6431764Z =========================== short test summary info ============================ 2025-12-04T09:59:13.6432785Z FAILED [9.3682s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.6433029Z Traceback (most recent call last): 2025-12-04T09:59:13.6433548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6433654Z getattr(self, test_name)() 2025-12-04T09:59:13.6434282Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6434366Z fn() 2025-12-04T09:59:13.6434820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6434914Z method(*args, **kwargs) 2025-12-04T09:59:13.6435363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6435464Z method(*args, **kwargs) 2025-12-04T09:59:13.6435909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6436004Z with policy(): 2025-12-04T09:59:13.6436458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6436552Z raise RuntimeError(msg) 2025-12-04T09:59:13.6437677Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 2. CUDA driver allocated memory was 604962816 and is now 674168832. 2025-12-04T09:59:13.6437683Z 2025-12-04T09:59:13.6437876Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6438504Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6438509Z 2025-12-04T09:59:13.6438744Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6438901Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.6439094Z ======================= 1 failed, 26 deselected in 9.58s ======================= 2025-12-04T09:59:13.6439181Z Got exit code 1 2025-12-04T09:59:13.6439274Z Retrying single test... 2025-12-04T09:59:13.6439835Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-55fdf9ad8e0a27f0.xml 2025-12-04T09:59:13.6439975Z ============================= test session starts ============================== 2025-12-04T09:59:13.6440288Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.6440381Z cachedir: .pytest_cache 2025-12-04T09:59:13.6440844Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.6440984Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.6441076Z configfile: pytest.ini 2025-12-04T09:59:13.6441742Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.6441951Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.6442681Z stepcurrent: skipping 14 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6442788Z Running 1 items in this shard 2025-12-04T09:59:13.6442793Z 2025-12-04T09:59:13.6443796Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda I1204 09:49:40.814000 69140 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 69192 2025-12-04T09:59:13.6444310Z I1204 09:49:40.815000 69140 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 69193 2025-12-04T09:59:13.6444954Z I1204 09:49:40.815000 69140 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 69194 2025-12-04T09:59:13.6445436Z I1204 09:49:40.816000 69140 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 69195 2025-12-04T09:59:13.6446414Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6446544Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6448508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6448605Z _warn_cpu_init() 2025-12-04T09:59:13.6449618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6449831Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6450791Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6450929Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6451881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6452013Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6452986Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6453112Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6455137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6455267Z _warn_cpu_init() 2025-12-04T09:59:13.6457640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6457743Z _warn_cpu_init() 2025-12-04T09:59:13.6459780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6459914Z _warn_cpu_init() 2025-12-04T09:59:13.6460942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6461165Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6462152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6462375Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6463375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6463600Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6468128Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.6468557Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.6469431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6469548Z return func(*args, **kwargs) 2025-12-04T09:59:13.6473971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.6474384Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.6478802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.6479219Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.6479975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6480081Z return func(*args, **kwargs) 2025-12-04T09:59:13.6484450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.6484832Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.6485611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6485719Z return func(*args, **kwargs) 2025-12-04T09:59:13.6486456Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6486571Z return func(*args, **kwargs) 2025-12-04T09:59:13.6487301Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6487404Z return func(*args, **kwargs) 2025-12-04T09:59:13.6488190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6488305Z return func(*args, **kwargs) 2025-12-04T09:59:13.6489048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6489154Z return func(*args, **kwargs) 2025-12-04T09:59:13.6489885Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6490021Z return func(*args, **kwargs) 2025-12-04T09:59:13.6490993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.6491114Z return func(*args, **kwargs) 2025-12-04T09:59:13.6491560Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6492080Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6493059Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6493554Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6494522Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6494909Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6495871Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6496410Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6497541Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6498041Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6499038Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6499497Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6500457Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6500954Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6502642Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736. 2025-12-04T09:59:13.6503048Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6503736Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6504891Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6505293Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6506006Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6506559Z [rank0]:E1204 09:49:48.465000 69192 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.6507009Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6507537Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6508658Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6509141Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6510082Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6510482Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6511395Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6511849Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6512752Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6513429Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6514362Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6514804Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6515742Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6516229Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6517906Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 1. CUDA driver allocated memory was 607059968 and is now 674168832. 2025-12-04T09:59:13.6518264Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6518903Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6520049Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6520407Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6521419Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6521978Z [rank1]:E1204 09:49:48.467000 69193 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.6522428Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6522956Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6523959Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6524466Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6525525Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6525923Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6526889Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6527376Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6528374Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6528872Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6529828Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6530282Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6531242Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6531781Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6533586Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 2. CUDA driver allocated memory was 609157120 and is now 674168832. 2025-12-04T09:59:13.6533943Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6534645Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6535763Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6536123Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6537062Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6537620Z [rank2]:E1204 09:49:48.468000 69194 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.6538074Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6538598Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6539702Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6540214Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6541211Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6541608Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6542570Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6543097Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6544061Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6544553Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6545510Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6545961Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6546957Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6547456Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6549241Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 3. CUDA driver allocated memory was 604962816 and is now 674168832. 2025-12-04T09:59:13.6549628Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6550278Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6551394Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6551752Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6552440Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6552974Z [rank3]:E1204 09:49:48.469000 69195 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.6553074Z dist init r=1, world=4 2025-12-04T09:59:13.6553168Z dist init r=0, world=4 2025-12-04T09:59:13.6553269Z dist init r=2, world=4 2025-12-04T09:59:13.6553362Z dist init r=3, world=4 2025-12-04T09:59:13.6554534Z [rank0]:[W1204 09:49:48.479500936 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.6554640Z FAILED [9.6858s] [100%] 2025-12-04T09:59:13.6554646Z 2025-12-04T09:59:13.6554789Z =================================== FAILURES =================================== 2025-12-04T09:59:13.6555097Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda __ 2025-12-04T09:59:13.6555213Z Traceback (most recent call last): 2025-12-04T09:59:13.6555745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.6555864Z self._join_processes(fn) 2025-12-04T09:59:13.6556430Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.6556600Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.6557191Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.6557298Z raise RuntimeError(error) 2025-12-04T09:59:13.6557527Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.6557639Z Traceback (most recent call last): 2025-12-04T09:59:13.6558165Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6558283Z getattr(self, test_name)() 2025-12-04T09:59:13.6558797Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6558915Z fn() 2025-12-04T09:59:13.6559407Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6559511Z method(*args, **kwargs) 2025-12-04T09:59:13.6560004Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6560105Z method(*args, **kwargs) 2025-12-04T09:59:13.6560590Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6560692Z with policy(): 2025-12-04T09:59:13.6561183Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6561325Z raise RuntimeError(msg) 2025-12-04T09:59:13.6562520Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736. 2025-12-04T09:59:13.6562531Z 2025-12-04T09:59:13.6562744Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6563426Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6563432Z 2025-12-04T09:59:13.6563688Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6563693Z 2025-12-04T09:59:13.6563697Z 2025-12-04T09:59:13.6563915Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.6564168Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.6564953Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-55fdf9ad8e0a27f0.xml - 2025-12-04T09:59:13.6565118Z =========================== short test summary info ============================ 2025-12-04T09:59:13.6565982Z FAILED [9.6858s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.6566110Z Traceback (most recent call last): 2025-12-04T09:59:13.6566646Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6566814Z getattr(self, test_name)() 2025-12-04T09:59:13.6567333Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6567418Z fn() 2025-12-04T09:59:13.6567913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6568016Z method(*args, **kwargs) 2025-12-04T09:59:13.6568532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6568637Z method(*args, **kwargs) 2025-12-04T09:59:13.6569130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6569228Z with policy(): 2025-12-04T09:59:13.6569719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6569821Z raise RuntimeError(msg) 2025-12-04T09:59:13.6571020Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 720306176 and is now 783220736. 2025-12-04T09:59:13.6571053Z 2025-12-04T09:59:13.6571259Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6571947Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6571953Z 2025-12-04T09:59:13.6572208Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6572383Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.6572563Z ======================= 1 failed, 26 deselected in 9.91s ======================= 2025-12-04T09:59:13.6572657Z Got exit code 1 2025-12-04T09:59:13.6573287Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda 2025-12-04T09:59:13.6573678Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.6574281Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e1cdaa245647d1a.xml 2025-12-04T09:59:13.6574450Z ============================= test session starts ============================== 2025-12-04T09:59:13.6574787Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.6574899Z cachedir: .pytest_cache 2025-12-04T09:59:13.6575394Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.6575511Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.6575621Z configfile: pytest.ini 2025-12-04T09:59:13.6576145Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.6576427Z collecting ... collected 60 items / 15 deselected / 45 selected 2025-12-04T09:59:13.6576575Z stepcurrent: skipping 15 already run items. 2025-12-04T09:59:13.6576853Z Running 12 items in this shard 2025-12-04T09:59:13.6576860Z 2025-12-04T09:59:13.6577958Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda I1204 09:49:55.183000 69477 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 69529 2025-12-04T09:59:13.6578456Z I1204 09:49:55.184000 69477 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 69530 2025-12-04T09:59:13.6578945Z I1204 09:49:55.185000 69477 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 69531 2025-12-04T09:59:13.6579443Z I1204 09:49:55.186000 69477 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 69532 2025-12-04T09:59:13.6581512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6581624Z _warn_cpu_init() 2025-12-04T09:59:13.6583635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6583747Z _warn_cpu_init() 2025-12-04T09:59:13.6585759Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6585894Z _warn_cpu_init() 2025-12-04T09:59:13.6587908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6588042Z _warn_cpu_init() 2025-12-04T09:59:13.6589151Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.6589265Z return func(*args, **kwargs) 2025-12-04T09:59:13.6589721Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6590243Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6591243Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6591739Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6592708Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6593147Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6594087Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6594565Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6595499Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6596007Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6596934Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6597365Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6598318Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6598795Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6600456Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 586088448 and is now 649003008. 2025-12-04T09:59:13.6600815Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6601471Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6602601Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6602955Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6603668Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6604198Z [rank2]:E1204 09:50:03.410000 69531 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.6604646Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6605185Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6606168Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6606664Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6607654Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6608054Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6608987Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6609477Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6610441Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6610922Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6611867Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6612302Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6613249Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6613760Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6615389Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912. 2025-12-04T09:59:13.6615747Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6616511Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6617830Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6618201Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6618926Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6619472Z [rank0]:E1204 09:50:03.412000 69529 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.6619936Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6620468Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6622015Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6622905Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6623905Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6624320Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6625283Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6625837Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6626802Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6627294Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6628264Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6628712Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6629738Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6630229Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6631903Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.6632304Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6632976Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6634214Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6634569Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6635278Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6635811Z [rank3]:E1204 09:50:03.413000 69532 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.6636254Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6636773Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6637842Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6638347Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6639304Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6639702Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6640664Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6641152Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6642082Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6642558Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6643506Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6643972Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6644919Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6645393Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6647017Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.6647415Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6648068Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6649168Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6649519Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6650226Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6650755Z [rank1]:E1204 09:50:03.414000 69530 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.6650866Z dist init r=3, world=4 2025-12-04T09:59:13.6650963Z dist init r=2, world=4 2025-12-04T09:59:13.6651086Z dist init r=1, world=4 2025-12-04T09:59:13.6651191Z dist init r=0, world=4 2025-12-04T09:59:13.6652313Z [rank0]:[W1204 09:50:03.455570687 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.6652413Z FAILED [9.8998s] [ 8%] 2025-12-04T09:59:13.6652429Z 2025-12-04T09:59:13.6652573Z =================================== FAILURES =================================== 2025-12-04T09:59:13.6652873Z ___ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda ____ 2025-12-04T09:59:13.6653002Z Traceback (most recent call last): 2025-12-04T09:59:13.6653534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.6653673Z self._join_processes(fn) 2025-12-04T09:59:13.6654258Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.6654395Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.6654996Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.6655107Z raise RuntimeError(error) 2025-12-04T09:59:13.6655336Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.6655463Z Traceback (most recent call last): 2025-12-04T09:59:13.6655992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6656131Z getattr(self, test_name)() 2025-12-04T09:59:13.6656925Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6657023Z fn() 2025-12-04T09:59:13.6657548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6657657Z method(*args, **kwargs) 2025-12-04T09:59:13.6658169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6658287Z method(*args, **kwargs) 2025-12-04T09:59:13.6658796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6658934Z with policy(): 2025-12-04T09:59:13.6659455Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6659570Z raise RuntimeError(msg) 2025-12-04T09:59:13.6660805Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 586088448 and is now 649003008. 2025-12-04T09:59:13.6660812Z 2025-12-04T09:59:13.6661031Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6661715Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6661730Z 2025-12-04T09:59:13.6661997Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6662004Z 2025-12-04T09:59:13.6662008Z 2025-12-04T09:59:13.6662227Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.6662502Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.6663310Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e1cdaa245647d1a.xml - 2025-12-04T09:59:13.6663523Z =========================== short test summary info ============================ 2025-12-04T09:59:13.6664366Z FAILED [9.8998s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.6664490Z Traceback (most recent call last): 2025-12-04T09:59:13.6665054Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6665172Z getattr(self, test_name)() 2025-12-04T09:59:13.6665721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6665813Z fn() 2025-12-04T09:59:13.6666325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6666469Z method(*args, **kwargs) 2025-12-04T09:59:13.6666980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6667082Z method(*args, **kwargs) 2025-12-04T09:59:13.6667595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6667692Z with policy(): 2025-12-04T09:59:13.6668215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6668329Z raise RuntimeError(msg) 2025-12-04T09:59:13.6669626Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 586088448 and is now 649003008. 2025-12-04T09:59:13.6669663Z 2025-12-04T09:59:13.6669888Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6670555Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6670561Z 2025-12-04T09:59:13.6670828Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6671003Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.6671202Z ====================== 1 failed, 15 deselected in 10.12s ======================= 2025-12-04T09:59:13.6671307Z Got exit code 1 2025-12-04T09:59:13.6671410Z Retrying single test... 2025-12-04T09:59:13.6672028Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a996648fbbff19f5.xml 2025-12-04T09:59:13.6672188Z ============================= test session starts ============================== 2025-12-04T09:59:13.6672535Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.6672650Z cachedir: .pytest_cache 2025-12-04T09:59:13.6673149Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.6673268Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.6673380Z configfile: pytest.ini 2025-12-04T09:59:13.6673896Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.6674121Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.6674862Z stepcurrent: skipping 15 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6674973Z Running 1 items in this shard 2025-12-04T09:59:13.6674978Z 2025-12-04T09:59:13.6676058Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda I1204 09:50:09.934000 69814 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 69866 2025-12-04T09:59:13.6676540Z I1204 09:50:09.935000 69814 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 69867 2025-12-04T09:59:13.6677026Z I1204 09:50:09.936000 69814 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 69868 2025-12-04T09:59:13.6677510Z I1204 09:50:09.936000 69814 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 69869 2025-12-04T09:59:13.6679547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6679648Z _warn_cpu_init() 2025-12-04T09:59:13.6681595Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6681707Z _warn_cpu_init() 2025-12-04T09:59:13.6683693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6683794Z _warn_cpu_init() 2025-12-04T09:59:13.6685737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6685878Z _warn_cpu_init() 2025-12-04T09:59:13.6686850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.6686967Z return func(*args, **kwargs) 2025-12-04T09:59:13.6687416Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6687940Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6688914Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6689416Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6690525Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6690906Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6691812Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6692281Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6693190Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6693687Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6694594Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6695024Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6695934Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6696477Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6698380Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 716111872 and is now 758054912. 2025-12-04T09:59:13.6698748Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6699422Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6700593Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6701002Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6701722Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6702278Z [rank0]:E1204 09:50:18.090000 69866 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.6702735Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6703271Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6704287Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6704804Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6705836Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6706236Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6707192Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6707697Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6708805Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6709401Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6710303Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6710740Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6711649Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6712146Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6713736Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.6714081Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6714750Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6715841Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6716201Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6716880Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6717405Z [rank3]:E1204 09:50:18.090000 69869 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.6717834Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6718337Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6719323Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6719803Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6720882Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6721455Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6722418Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6722984Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6723954Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6724454Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6725425Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6725950Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6726915Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6727410Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6729087Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 609157120 and is now 649003008. 2025-12-04T09:59:13.6729491Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6730169Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6731303Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6731684Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6732400Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6732944Z [rank1]:E1204 09:50:18.090000 69867 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.6733412Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6734034Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6735024Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6735503Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6736519Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6737090Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6738098Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6738599Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6739565Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6740072Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6741032Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6741523Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6742488Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6742983Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6744682Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T09:59:13.6745052Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6745730Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6746866Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6747237Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6747951Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6748499Z [rank2]:E1204 09:50:18.091000 69868 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.6748730Z dist init r=0, world=4 2025-12-04T09:59:13.6748861Z dist init r=1, world=4 2025-12-04T09:59:13.6748970Z dist init r=2, world=4 2025-12-04T09:59:13.6749178Z dist init r=3, world=4 2025-12-04T09:59:13.6750270Z [rank0]:[W1204 09:50:18.109269972 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.6750381Z FAILED [10.1350s] [100%] 2025-12-04T09:59:13.6750386Z 2025-12-04T09:59:13.6750527Z =================================== FAILURES =================================== 2025-12-04T09:59:13.6750836Z ___ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda ____ 2025-12-04T09:59:13.6750951Z Traceback (most recent call last): 2025-12-04T09:59:13.6751515Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.6751629Z self._join_processes(fn) 2025-12-04T09:59:13.6752188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.6752322Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.6753024Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.6753132Z raise RuntimeError(error) 2025-12-04T09:59:13.6753355Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.6753460Z Traceback (most recent call last): 2025-12-04T09:59:13.6753939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6754082Z getattr(self, test_name)() 2025-12-04T09:59:13.6754556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6754636Z fn() 2025-12-04T09:59:13.6755094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6755189Z method(*args, **kwargs) 2025-12-04T09:59:13.6755643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6755733Z method(*args, **kwargs) 2025-12-04T09:59:13.6756210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6756308Z with policy(): 2025-12-04T09:59:13.6756762Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6756861Z raise RuntimeError(msg) 2025-12-04T09:59:13.6757953Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.6757958Z 2025-12-04T09:59:13.6758150Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6758768Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6758775Z 2025-12-04T09:59:13.6759011Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6759016Z 2025-12-04T09:59:13.6759020Z 2025-12-04T09:59:13.6759227Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.6759465Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.6760207Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a996648fbbff19f5.xml - 2025-12-04T09:59:13.6760374Z =========================== short test summary info ============================ 2025-12-04T09:59:13.6761122Z FAILED [10.1350s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.6761245Z Traceback (most recent call last): 2025-12-04T09:59:13.6761738Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6761836Z getattr(self, test_name)() 2025-12-04T09:59:13.6762319Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6762401Z fn() 2025-12-04T09:59:13.6762888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6762986Z method(*args, **kwargs) 2025-12-04T09:59:13.6763438Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6763540Z method(*args, **kwargs) 2025-12-04T09:59:13.6764014Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6764100Z with policy(): 2025-12-04T09:59:13.6764565Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6764665Z raise RuntimeError(msg) 2025-12-04T09:59:13.6765752Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.6765784Z 2025-12-04T09:59:13.6765977Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6766583Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6766596Z 2025-12-04T09:59:13.6766828Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6766987Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.6767186Z ====================== 1 failed, 26 deselected in 10.35s ======================= 2025-12-04T09:59:13.6767274Z Got exit code 1 2025-12-04T09:59:13.6767369Z Retrying single test... 2025-12-04T09:59:13.6768124Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc1573489c80017b.xml 2025-12-04T09:59:13.6768279Z ============================= test session starts ============================== 2025-12-04T09:59:13.6768610Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.6768712Z cachedir: .pytest_cache 2025-12-04T09:59:13.6769198Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.6769317Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.6769421Z configfile: pytest.ini 2025-12-04T09:59:13.6769930Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.6770141Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.6770859Z stepcurrent: skipping 15 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6770976Z Running 1 items in this shard 2025-12-04T09:59:13.6770983Z 2025-12-04T09:59:13.6771995Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda I1204 09:50:24.834000 70151 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 70203 2025-12-04T09:59:13.6772465Z I1204 09:50:24.835000 70151 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 70204 2025-12-04T09:59:13.6772939Z I1204 09:50:24.836000 70151 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 70205 2025-12-04T09:59:13.6773401Z I1204 09:50:24.836000 70151 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 70206 2025-12-04T09:59:13.6775349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6775444Z _warn_cpu_init() 2025-12-04T09:59:13.6777657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6777802Z _warn_cpu_init() 2025-12-04T09:59:13.6779821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6779922Z _warn_cpu_init() 2025-12-04T09:59:13.6781945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6782076Z _warn_cpu_init() 2025-12-04T09:59:13.6783088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.6783210Z return func(*args, **kwargs) 2025-12-04T09:59:13.6783678Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6784228Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6785234Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6785743Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6786775Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6787179Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6788154Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6788763Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6789760Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6790223Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6791070Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6791475Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6792331Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6792802Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6794286Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912. 2025-12-04T09:59:13.6794617Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6795204Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6796249Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6796575Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6797211Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6797706Z [rank0]:E1204 09:50:32.932000 70203 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.6798130Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6798614Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6799503Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6799997Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6800886Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6801240Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6802101Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6802540Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6803607Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6804070Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6804966Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6805398Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6806311Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6806815Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6808387Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 1. CUDA driver allocated memory was 611254272 and is now 649003008. 2025-12-04T09:59:13.6808778Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6809397Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6810481Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6810825Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6811503Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6812027Z [rank1]:E1204 09:50:32.934000 70204 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.6812451Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6812964Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6813936Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6814418Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6815359Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6815736Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6816899Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6817434Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6818410Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6818897Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6819861Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6820359Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6821545Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6822057Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6823732Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 2. CUDA driver allocated memory was 604962816 and is now 649003008. 2025-12-04T09:59:13.6824188Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6824852Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6825987Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6826364Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6827078Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6827646Z [rank2]:E1204 09:50:32.935000 70205 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.6828101Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6828687Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6829687Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6830195Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6831195Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6831597Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6832728Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6833316Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6834237Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6834701Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6835653Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6836086Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6836993Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6837493Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6839277Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 3. CUDA driver allocated memory was 607059968 and is now 649003008. 2025-12-04T09:59:13.6839657Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6840300Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6841399Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6841766Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6842459Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6842999Z [rank3]:E1204 09:50:32.935000 70206 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.6843126Z dist init r=0, world=4 2025-12-04T09:59:13.6843226Z dist init r=3, world=4 2025-12-04T09:59:13.6843330Z dist init r=1, world=4 2025-12-04T09:59:13.6843424Z dist init r=2, world=4 2025-12-04T09:59:13.6844564Z [rank0]:[W1204 09:50:33.943332770 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.6844666Z FAILED [10.2022s] [100%] 2025-12-04T09:59:13.6844672Z 2025-12-04T09:59:13.6844813Z =================================== FAILURES =================================== 2025-12-04T09:59:13.6845128Z ___ TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda ____ 2025-12-04T09:59:13.6845248Z Traceback (most recent call last): 2025-12-04T09:59:13.6845816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.6845930Z self._join_processes(fn) 2025-12-04T09:59:13.6846500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.6846647Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.6847235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.6847350Z raise RuntimeError(error) 2025-12-04T09:59:13.6847592Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.6847716Z Traceback (most recent call last): 2025-12-04T09:59:13.6848254Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6848390Z getattr(self, test_name)() 2025-12-04T09:59:13.6848911Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6849010Z fn() 2025-12-04T09:59:13.6849502Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6849607Z method(*args, **kwargs) 2025-12-04T09:59:13.6850107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6850433Z method(*args, **kwargs) 2025-12-04T09:59:13.6851031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6851120Z with policy(): 2025-12-04T09:59:13.6851576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6851689Z raise RuntimeError(msg) 2025-12-04T09:59:13.6852766Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912. 2025-12-04T09:59:13.6852772Z 2025-12-04T09:59:13.6852974Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6853578Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6853585Z 2025-12-04T09:59:13.6853822Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6853837Z 2025-12-04T09:59:13.6853841Z 2025-12-04T09:59:13.6854041Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.6854277Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.6855027Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc1573489c80017b.xml - 2025-12-04T09:59:13.6855181Z =========================== short test summary info ============================ 2025-12-04T09:59:13.6855942Z FAILED [10.2022s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.6856051Z Traceback (most recent call last): 2025-12-04T09:59:13.6856621Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6856917Z getattr(self, test_name)() 2025-12-04T09:59:13.6857463Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6857555Z fn() 2025-12-04T09:59:13.6858121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6858233Z method(*args, **kwargs) 2025-12-04T09:59:13.6858761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6858870Z method(*args, **kwargs) 2025-12-04T09:59:13.6859379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6859493Z with policy(): 2025-12-04T09:59:13.6860009Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6860122Z raise RuntimeError(msg) 2025-12-04T09:59:13.6861378Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 718209024 and is now 758054912. 2025-12-04T09:59:13.6861386Z 2025-12-04T09:59:13.6861604Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6862300Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6862307Z 2025-12-04T09:59:13.6862572Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6862802Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.6862983Z ====================== 1 failed, 26 deselected in 10.42s ======================= 2025-12-04T09:59:13.6863084Z Got exit code 1 2025-12-04T09:59:13.6863703Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda 2025-12-04T09:59:13.6864134Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.6864828Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4d2b72d464b1c339.xml 2025-12-04T09:59:13.6865001Z ============================= test session starts ============================== 2025-12-04T09:59:13.6865357Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.6865480Z cachedir: .pytest_cache 2025-12-04T09:59:13.6865998Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.6866121Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.6866241Z configfile: pytest.ini 2025-12-04T09:59:13.6866776Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.6867006Z collecting ... collected 60 items / 16 deselected / 44 selected 2025-12-04T09:59:13.6867176Z stepcurrent: skipping 16 already run items. 2025-12-04T09:59:13.6867291Z Running 11 items in this shard 2025-12-04T09:59:13.6867296Z 2025-12-04T09:59:13.6868525Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda I1204 09:50:39.784000 70488 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 70540 2025-12-04T09:59:13.6869123Z I1204 09:50:39.785000 70488 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 70541 2025-12-04T09:59:13.6869580Z I1204 09:50:39.786000 70488 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 70542 2025-12-04T09:59:13.6870199Z I1204 09:50:39.786000 70488 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 70543 2025-12-04T09:59:13.6871176Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6871316Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6873444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6873591Z _warn_cpu_init() 2025-12-04T09:59:13.6874558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6874702Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6876661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6876844Z _warn_cpu_init() 2025-12-04T09:59:13.6877831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6878064Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6879045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6879260Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6880222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6880353Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6882360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6882462Z _warn_cpu_init() 2025-12-04T09:59:13.6883431Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6883558Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6884525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6884771Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6886816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6886925Z _warn_cpu_init() 2025-12-04T09:59:13.6887858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6888082Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6888816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6888950Z return func(*args, **kwargs) 2025-12-04T09:59:13.6889794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6889898Z return func(*args, **kwargs) 2025-12-04T09:59:13.6890581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6890681Z return func(*args, **kwargs) 2025-12-04T09:59:13.6891383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.6891489Z return func(*args, **kwargs) 2025-12-04T09:59:13.6892165Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6892280Z return func(*args, **kwargs) 2025-12-04T09:59:13.6892953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6893052Z return func(*args, **kwargs) 2025-12-04T09:59:13.6893735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6893837Z return func(*args, **kwargs) 2025-12-04T09:59:13.6894524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.6894623Z return func(*args, **kwargs) 2025-12-04T09:59:13.6895533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.6895645Z return func(*args, **kwargs) 2025-12-04T09:59:13.6896054Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6896631Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6897805Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6898322Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6899367Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6899766Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6900737Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6901232Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6902206Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6902744Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6903704Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6904164Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6905129Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6905665Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6907520Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 154112 on device 0. CUDA driver allocated memory was 711917568 and is now 785317888. 2025-12-04T09:59:13.6907902Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6908678Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6910000Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.6910345Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6911028Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6911534Z [rank0]:E1204 09:50:47.258000 70540 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.6911939Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6912431Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6913354Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6913811Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6914708Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6915065Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6915937Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6916397Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6917259Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6917693Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6918545Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6918989Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6919852Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6920309Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6922386Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 154112 on device 1. CUDA driver allocated memory was 604962816 and is now 676265984. 2025-12-04T09:59:13.6922770Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6923429Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6924809Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.6925188Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6925912Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6926474Z [rank1]:E1204 09:50:47.260000 70541 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.6926930Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6927512Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6928525Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6929035Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6930038Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6930439Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6931456Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6931952Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6932924Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6933411Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6934472Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6934914Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6936013Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6936588Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6938597Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 154112 on device 3. CUDA driver allocated memory was 607059968 and is now 676265984. 2025-12-04T09:59:13.6938980Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6939676Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6940990Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.6941365Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6942087Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6942682Z [rank3]:E1204 09:50:47.261000 70543 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.6943134Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.6943674Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.6944679Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6945193Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.6946195Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6946628Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.6947594Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6948078Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6949279Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6949719Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.6950573Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6950979Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.6951833Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6952285Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.6953962Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 154112 on device 2. CUDA driver allocated memory was 609157120 and is now 676265984. 2025-12-04T09:59:13.6954301Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6954884Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6956061Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.6956390Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.6957282Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6957808Z [rank2]:E1204 09:50:47.261000 70542 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.6957907Z dist init r=0, world=4 2025-12-04T09:59:13.6958017Z dist init r=1, world=4 2025-12-04T09:59:13.6958111Z dist init r=2, world=4 2025-12-04T09:59:13.6958207Z dist init r=3, world=4 2025-12-04T09:59:13.6959310Z [rank0]:[W1204 09:50:47.274602847 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.6959435Z FAILED [9.8094s] [ 9%] 2025-12-04T09:59:13.6959441Z 2025-12-04T09:59:13.6959584Z =================================== FAILURES =================================== 2025-12-04T09:59:13.6960047Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda _ 2025-12-04T09:59:13.6960163Z Traceback (most recent call last): 2025-12-04T09:59:13.6960688Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.6960794Z self._join_processes(fn) 2025-12-04T09:59:13.6961349Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.6961521Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.6962093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.6962215Z raise RuntimeError(error) 2025-12-04T09:59:13.6962438Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.6962556Z Traceback (most recent call last): 2025-12-04T09:59:13.6963077Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6963183Z getattr(self, test_name)() 2025-12-04T09:59:13.6963687Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6963785Z fn() 2025-12-04T09:59:13.6964263Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6964380Z method(*args, **kwargs) 2025-12-04T09:59:13.6964859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6964962Z method(*args, **kwargs) 2025-12-04T09:59:13.6965450Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6965546Z with policy(): 2025-12-04T09:59:13.6966057Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6966173Z raise RuntimeError(msg) 2025-12-04T09:59:13.6967493Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 154112 on device 1. CUDA driver allocated memory was 604962816 and is now 676265984. 2025-12-04T09:59:13.6967501Z 2025-12-04T09:59:13.6967720Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6968534Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.6968541Z 2025-12-04T09:59:13.6968833Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6968838Z 2025-12-04T09:59:13.6968845Z 2025-12-04T09:59:13.6969050Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.6969295Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.6970072Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4d2b72d464b1c339.xml - 2025-12-04T09:59:13.6970235Z =========================== short test summary info ============================ 2025-12-04T09:59:13.6971211Z FAILED [9.8094s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.6971354Z Traceback (most recent call last): 2025-12-04T09:59:13.6971972Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.6972089Z getattr(self, test_name)() 2025-12-04T09:59:13.6972563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.6972659Z fn() 2025-12-04T09:59:13.6973107Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6973201Z method(*args, **kwargs) 2025-12-04T09:59:13.6974169Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.6974265Z method(*args, **kwargs) 2025-12-04T09:59:13.6974713Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.6974817Z with policy(): 2025-12-04T09:59:13.6975274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.6975391Z raise RuntimeError(msg) 2025-12-04T09:59:13.6976900Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 154112 on device 1. CUDA driver allocated memory was 604962816 and is now 676265984. 2025-12-04T09:59:13.6976912Z 2025-12-04T09:59:13.6977135Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.6978012Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.6978020Z 2025-12-04T09:59:13.6978289Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.6978489Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.6978710Z ====================== 1 failed, 16 deselected in 10.03s ======================= 2025-12-04T09:59:13.6978808Z Got exit code 1 2025-12-04T09:59:13.6978929Z Retrying single test... 2025-12-04T09:59:13.6979553Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-65dbafa4918c0ef1.xml 2025-12-04T09:59:13.6979727Z ============================= test session starts ============================== 2025-12-04T09:59:13.6980079Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.6980189Z cachedir: .pytest_cache 2025-12-04T09:59:13.6980714Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.6980841Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.6980948Z configfile: pytest.ini 2025-12-04T09:59:13.6981533Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.6981758Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.6982722Z stepcurrent: skipping 16 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.6982837Z Running 1 items in this shard 2025-12-04T09:59:13.6982845Z 2025-12-04T09:59:13.6984072Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda I1204 09:50:54.283000 70825 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 70877 2025-12-04T09:59:13.6984616Z I1204 09:50:54.284000 70825 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 70878 2025-12-04T09:59:13.6985115Z I1204 09:50:54.285000 70825 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 70879 2025-12-04T09:59:13.6985621Z I1204 09:50:54.286000 70825 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 70880 2025-12-04T09:59:13.6986628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6986810Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6988954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6989169Z _warn_cpu_init() 2025-12-04T09:59:13.6990124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6990333Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6991281Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6991411Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6993358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6993455Z _warn_cpu_init() 2025-12-04T09:59:13.6994574Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6994722Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.6996705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.6996817Z _warn_cpu_init() 2025-12-04T09:59:13.6997788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6998015Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.6998974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.6999133Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7001088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7001185Z _warn_cpu_init() 2025-12-04T09:59:13.7002157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7002399Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7003375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7003588Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7004336Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7004456Z return func(*args, **kwargs) 2025-12-04T09:59:13.7005203Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7005328Z return func(*args, **kwargs) 2025-12-04T09:59:13.7006072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7006182Z return func(*args, **kwargs) 2025-12-04T09:59:13.7006979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7007088Z return func(*args, **kwargs) 2025-12-04T09:59:13.7007833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7007938Z return func(*args, **kwargs) 2025-12-04T09:59:13.7008671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7008791Z return func(*args, **kwargs) 2025-12-04T09:59:13.7009524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7009647Z return func(*args, **kwargs) 2025-12-04T09:59:13.7010518Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7010735Z return func(*args, **kwargs) 2025-12-04T09:59:13.7011633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.7011731Z return func(*args, **kwargs) 2025-12-04T09:59:13.7012156Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7012831Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7013811Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7014301Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7015232Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7015639Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7016626Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7017296Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7018270Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7018759Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7019727Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7020174Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7021351Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7021920Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7023778Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 147968 on device 1. CUDA driver allocated memory was 609157120 and is now 676265984. 2025-12-04T09:59:13.7024149Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7024850Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7026177Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7026544Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7027275Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7027821Z [rank1]:E1204 09:51:01.753000 70878 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.7028326Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7028866Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7029877Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7030398Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7031426Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7031835Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7032797Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7033351Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7034218Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7034651Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7035517Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7035947Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7036817Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7037253Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7038896Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 152064 on device 0. CUDA driver allocated memory was 714014720 and is now 785317888. 2025-12-04T09:59:13.7039252Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7039838Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7041005Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7041329Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7042001Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7042486Z [rank0]:E1204 09:51:01.754000 70877 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.7042895Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7043363Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7044250Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7044740Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7045620Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7045986Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7046841Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7047281Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7048133Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7048570Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7049453Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7049853Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7050721Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7051160Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7052839Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 2. CUDA driver allocated memory was 604962816 and is now 676265984. 2025-12-04T09:59:13.7053161Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7053747Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7054921Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7055274Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7055923Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7056478Z [rank2]:E1204 09:51:01.755000 70879 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.7057087Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7057675Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7058675Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7059197Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7060184Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7060589Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7061556Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7062057Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7063044Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7063532Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7064505Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7064956Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7065925Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7066452Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7068314Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 3. CUDA driver allocated memory was 611254272 and is now 676265984. 2025-12-04T09:59:13.7068679Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7069402Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7070678Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7071003Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7071651Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7072163Z [rank3]:E1204 09:51:01.756000 70880 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.7072267Z dist init r=0, world=4 2025-12-04T09:59:13.7072356Z dist init r=2, world=4 2025-12-04T09:59:13.7072448Z dist init r=1, world=4 2025-12-04T09:59:13.7072551Z dist init r=3, world=4 2025-12-04T09:59:13.7073585Z [rank0]:[W1204 09:51:02.772088461 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.7073674Z FAILED [9.7993s] [100%] 2025-12-04T09:59:13.7073679Z 2025-12-04T09:59:13.7073820Z =================================== FAILURES =================================== 2025-12-04T09:59:13.7074243Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda _ 2025-12-04T09:59:13.7074362Z Traceback (most recent call last): 2025-12-04T09:59:13.7074847Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.7074947Z self._join_processes(fn) 2025-12-04T09:59:13.7075474Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.7075605Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.7076175Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.7076279Z raise RuntimeError(error) 2025-12-04T09:59:13.7076488Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.7076605Z Traceback (most recent call last): 2025-12-04T09:59:13.7077084Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7077183Z getattr(self, test_name)() 2025-12-04T09:59:13.7077665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7077745Z fn() 2025-12-04T09:59:13.7078204Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7078296Z method(*args, **kwargs) 2025-12-04T09:59:13.7078773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7078877Z method(*args, **kwargs) 2025-12-04T09:59:13.7079323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7079408Z with policy(): 2025-12-04T09:59:13.7079871Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7079968Z raise RuntimeError(msg) 2025-12-04T09:59:13.7081206Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 2. CUDA driver allocated memory was 604962816 and is now 676265984. 2025-12-04T09:59:13.7081238Z 2025-12-04T09:59:13.7081436Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7082205Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7082210Z 2025-12-04T09:59:13.7082446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7082451Z 2025-12-04T09:59:13.7082455Z 2025-12-04T09:59:13.7082651Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.7082915Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.7083630Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-65dbafa4918c0ef1.xml - 2025-12-04T09:59:13.7083793Z =========================== short test summary info ============================ 2025-12-04T09:59:13.7084700Z FAILED [9.7993s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.7084807Z Traceback (most recent call last): 2025-12-04T09:59:13.7085306Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7085406Z getattr(self, test_name)() 2025-12-04T09:59:13.7085899Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7085979Z fn() 2025-12-04T09:59:13.7086435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7086541Z method(*args, **kwargs) 2025-12-04T09:59:13.7086991Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7087109Z method(*args, **kwargs) 2025-12-04T09:59:13.7087568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7087655Z with policy(): 2025-12-04T09:59:13.7088114Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7088210Z raise RuntimeError(msg) 2025-12-04T09:59:13.7089454Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 2. CUDA driver allocated memory was 604962816 and is now 676265984. 2025-12-04T09:59:13.7089473Z 2025-12-04T09:59:13.7089666Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7090448Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7090453Z 2025-12-04T09:59:13.7090697Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7090857Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.7091020Z ====================== 1 failed, 26 deselected in 10.02s ======================= 2025-12-04T09:59:13.7091119Z Got exit code 1 2025-12-04T09:59:13.7091215Z Retrying single test... 2025-12-04T09:59:13.7091780Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5b8e1f7dea233320.xml 2025-12-04T09:59:13.7091949Z ============================= test session starts ============================== 2025-12-04T09:59:13.7092264Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.7092372Z cachedir: .pytest_cache 2025-12-04T09:59:13.7092828Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.7092936Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.7093041Z configfile: pytest.ini 2025-12-04T09:59:13.7093517Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.7093747Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.7094575Z stepcurrent: skipping 16 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7094680Z Running 1 items in this shard 2025-12-04T09:59:13.7094685Z 2025-12-04T09:59:13.7095780Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda I1204 09:51:08.724000 71162 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 71214 2025-12-04T09:59:13.7096222Z I1204 09:51:08.725000 71162 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 71215 2025-12-04T09:59:13.7096932Z I1204 09:51:08.726000 71162 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 71216 2025-12-04T09:59:13.7097434Z I1204 09:51:08.727000 71162 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 71217 2025-12-04T09:59:13.7098450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7098589Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7100658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7100768Z _warn_cpu_init() 2025-12-04T09:59:13.7101765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7101910Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7103948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7104062Z _warn_cpu_init() 2025-12-04T09:59:13.7105055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7105279Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7106327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7106549Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7107651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7107979Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7110153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7110379Z _warn_cpu_init() 2025-12-04T09:59:13.7111297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7111454Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7113321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7113487Z _warn_cpu_init() 2025-12-04T09:59:13.7114467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7114736Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7115700Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7115930Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7116635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7124012Z return func(*args, **kwargs) 2025-12-04T09:59:13.7124917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7125165Z return func(*args, **kwargs) 2025-12-04T09:59:13.7125943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7126055Z return func(*args, **kwargs) 2025-12-04T09:59:13.7126833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7126948Z return func(*args, **kwargs) 2025-12-04T09:59:13.7127728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7127885Z return func(*args, **kwargs) 2025-12-04T09:59:13.7128645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7128764Z return func(*args, **kwargs) 2025-12-04T09:59:13.7129521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7129632Z return func(*args, **kwargs) 2025-12-04T09:59:13.7130398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7130556Z return func(*args, **kwargs) 2025-12-04T09:59:13.7131563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.7131674Z return func(*args, **kwargs) 2025-12-04T09:59:13.7132140Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7132693Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7133786Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7134261Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7135320Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7135715Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7136820Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7137567Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7138538Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7139024Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7140025Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7140476Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7141453Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7141944Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7143798Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 0. CUDA driver allocated memory was 714014720 and is now 785317888. 2025-12-04T09:59:13.7144211Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7144874Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7146193Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7146594Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7147325Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7147874Z [rank0]:E1204 09:51:16.221000 71214 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.7148328Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7148979Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7150281Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7150751Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7151659Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7152027Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7152878Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7153313Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7154178Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7154638Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7155498Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7155894Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7156759Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7157293Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7158941Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 1. CUDA driver allocated memory was 607059968 and is now 676265984. 2025-12-04T09:59:13.7159265Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7159856Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7161052Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7161384Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7162032Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7162516Z [rank1]:E1204 09:51:16.221000 71215 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.7162924Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7163397Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7164299Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7164780Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7165659Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7166016Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7166870Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7167302Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7168190Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7168623Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7169491Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7169888Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7170777Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7171216Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7172855Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 2. CUDA driver allocated memory was 611254272 and is now 676265984. 2025-12-04T09:59:13.7173205Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7173795Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7174970Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7175297Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7175941Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7176506Z [rank2]:E1204 09:51:16.221000 71216 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.7177123Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7177664Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7178704Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7179228Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7180215Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7180628Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7181612Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7182099Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7183070Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7183557Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7184524Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7184999Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7185976Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7186463Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7188321Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 3. CUDA driver allocated memory was 609157120 and is now 676265984. 2025-12-04T09:59:13.7188734Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7189450Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7190620Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7190944Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7191586Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7192073Z [rank3]:E1204 09:51:16.223000 71217 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.7192206Z dist init r=0, world=4 2025-12-04T09:59:13.7192297Z dist init r=1, world=4 2025-12-04T09:59:13.7192382Z dist init r=2, world=4 2025-12-04T09:59:13.7192481Z dist init r=3, world=4 2025-12-04T09:59:13.7193515Z [rank0]:[W1204 09:51:16.226995320 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.7193609Z FAILED [10.2026s] [100%] 2025-12-04T09:59:13.7193616Z 2025-12-04T09:59:13.7193757Z =================================== FAILURES =================================== 2025-12-04T09:59:13.7194181Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda _ 2025-12-04T09:59:13.7194304Z Traceback (most recent call last): 2025-12-04T09:59:13.7194816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.7194919Z self._join_processes(fn) 2025-12-04T09:59:13.7195450Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.7195578Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.7196128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.7196230Z raise RuntimeError(error) 2025-12-04T09:59:13.7196439Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.7196553Z Traceback (most recent call last): 2025-12-04T09:59:13.7197063Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7197163Z getattr(self, test_name)() 2025-12-04T09:59:13.7197652Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7197735Z fn() 2025-12-04T09:59:13.7198188Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7198281Z method(*args, **kwargs) 2025-12-04T09:59:13.7198729Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7198860Z method(*args, **kwargs) 2025-12-04T09:59:13.7199309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7199398Z with policy(): 2025-12-04T09:59:13.7199866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7199961Z raise RuntimeError(msg) 2025-12-04T09:59:13.7201211Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 0. CUDA driver allocated memory was 714014720 and is now 785317888. 2025-12-04T09:59:13.7201217Z 2025-12-04T09:59:13.7201413Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7202178Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7202196Z 2025-12-04T09:59:13.7202439Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7202445Z 2025-12-04T09:59:13.7202595Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.7202714Z Traceback (most recent call last): 2025-12-04T09:59:13.7203236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7203340Z getattr(self, test_name)() 2025-12-04T09:59:13.7203827Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7203907Z fn() 2025-12-04T09:59:13.7204368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7204466Z method(*args, **kwargs) 2025-12-04T09:59:13.7204913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7205012Z method(*args, **kwargs) 2025-12-04T09:59:13.7205459Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7205545Z with policy(): 2025-12-04T09:59:13.7206030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7206128Z raise RuntimeError(msg) 2025-12-04T09:59:13.7207371Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 1. CUDA driver allocated memory was 607059968 and is now 676265984. 2025-12-04T09:59:13.7207378Z 2025-12-04T09:59:13.7207572Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7208341Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7208373Z 2025-12-04T09:59:13.7208614Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7208619Z 2025-12-04T09:59:13.7208767Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.7208884Z Traceback (most recent call last): 2025-12-04T09:59:13.7209365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7209473Z getattr(self, test_name)() 2025-12-04T09:59:13.7209942Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7210046Z fn() 2025-12-04T09:59:13.7210505Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7210597Z method(*args, **kwargs) 2025-12-04T09:59:13.7211052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7211156Z method(*args, **kwargs) 2025-12-04T09:59:13.7211606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7211699Z with policy(): 2025-12-04T09:59:13.7212147Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7212246Z raise RuntimeError(msg) 2025-12-04T09:59:13.7213476Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 2. CUDA driver allocated memory was 611254272 and is now 676265984. 2025-12-04T09:59:13.7213485Z 2025-12-04T09:59:13.7213674Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7214437Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7214466Z 2025-12-04T09:59:13.7214703Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7214708Z 2025-12-04T09:59:13.7214711Z 2025-12-04T09:59:13.7214920Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.7215151Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.7215863Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5b8e1f7dea233320.xml - 2025-12-04T09:59:13.7216031Z =========================== short test summary info ============================ 2025-12-04T09:59:13.7217272Z FAILED [10.2026s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.7217408Z Traceback (most recent call last): 2025-12-04T09:59:13.7217966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7218079Z getattr(self, test_name)() 2025-12-04T09:59:13.7218626Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7218713Z fn() 2025-12-04T09:59:13.7219230Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7219336Z method(*args, **kwargs) 2025-12-04T09:59:13.7219840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7219981Z method(*args, **kwargs) 2025-12-04T09:59:13.7220483Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7220583Z with policy(): 2025-12-04T09:59:13.7221340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7221455Z raise RuntimeError(msg) 2025-12-04T09:59:13.7222868Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 0. CUDA driver allocated memory was 714014720 and is now 785317888. 2025-12-04T09:59:13.7222947Z 2025-12-04T09:59:13.7223165Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7224036Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7224042Z 2025-12-04T09:59:13.7224309Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7224314Z 2025-12-04T09:59:13.7224478Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.7224608Z Traceback (most recent call last): 2025-12-04T09:59:13.7225156Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7225277Z getattr(self, test_name)() 2025-12-04T09:59:13.7225820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7225910Z fn() 2025-12-04T09:59:13.7226420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7226528Z method(*args, **kwargs) 2025-12-04T09:59:13.7227036Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7227185Z method(*args, **kwargs) 2025-12-04T09:59:13.7227696Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7227797Z with policy(): 2025-12-04T09:59:13.7228302Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7228413Z raise RuntimeError(msg) 2025-12-04T09:59:13.7229813Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 1. CUDA driver allocated memory was 607059968 and is now 676265984. 2025-12-04T09:59:13.7229821Z 2025-12-04T09:59:13.7230037Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7230941Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7230947Z 2025-12-04T09:59:13.7231214Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7231220Z 2025-12-04T09:59:13.7231390Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.7231522Z Traceback (most recent call last): 2025-12-04T09:59:13.7232068Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7232189Z getattr(self, test_name)() 2025-12-04T09:59:13.7232722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7232983Z fn() 2025-12-04T09:59:13.7233483Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7233585Z method(*args, **kwargs) 2025-12-04T09:59:13.7234062Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7234169Z method(*args, **kwargs) 2025-12-04T09:59:13.7234641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7234741Z with policy(): 2025-12-04T09:59:13.7235253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7235355Z raise RuntimeError(msg) 2025-12-04T09:59:13.7236676Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 150016 on device 2. CUDA driver allocated memory was 611254272 and is now 676265984. 2025-12-04T09:59:13.7236685Z 2025-12-04T09:59:13.7236889Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7237703Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7237709Z 2025-12-04T09:59:13.7237960Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7238141Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.7238314Z ====================== 1 failed, 26 deselected in 10.42s ======================= 2025-12-04T09:59:13.7238409Z Got exit code 1 2025-12-04T09:59:13.7239153Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda 2025-12-04T09:59:13.7239569Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.7240154Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d13641fc6f0b57c.xml 2025-12-04T09:59:13.7240316Z ============================= test session starts ============================== 2025-12-04T09:59:13.7240645Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.7240758Z cachedir: .pytest_cache 2025-12-04T09:59:13.7241244Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.7241359Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.7241473Z configfile: pytest.ini 2025-12-04T09:59:13.7242009Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.7242223Z collecting ... collected 60 items / 17 deselected / 43 selected 2025-12-04T09:59:13.7242352Z stepcurrent: skipping 17 already run items. 2025-12-04T09:59:13.7242456Z Running 10 items in this shard 2025-12-04T09:59:13.7242461Z 2025-12-04T09:59:13.7243613Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda I1204 09:51:23.183000 71499 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 71551 2025-12-04T09:59:13.7244083Z I1204 09:51:23.184000 71499 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 71552 2025-12-04T09:59:13.7244663Z I1204 09:51:23.185000 71499 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 71553 2025-12-04T09:59:13.7245135Z I1204 09:51:23.186000 71499 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 71554 2025-12-04T09:59:13.7246938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7247067Z _warn_cpu_init() 2025-12-04T09:59:13.7248854Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7248954Z _warn_cpu_init() 2025-12-04T09:59:13.7250727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7250825Z _warn_cpu_init() 2025-12-04T09:59:13.7252630Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7252727Z _warn_cpu_init() 2025-12-04T09:59:13.7253616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.7253713Z return func(*args, **kwargs) 2025-12-04T09:59:13.7254132Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7254610Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7255535Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7255988Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7257156Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7257558Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7258525Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7259061Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7260026Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7260526Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7261489Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7261977Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7262945Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7263435Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7265266Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 1. CUDA driver allocated memory was 602865664 and is now 651100160. 2025-12-04T09:59:13.7265633Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7266302Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7267625Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7268005Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7268837Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7269451Z [rank1]:E1204 09:51:30.892000 71552 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.7269864Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7270366Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7271264Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7271715Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7272601Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7272958Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7273836Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7274279Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7275130Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7275594Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7276441Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7276846Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7277705Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7278142Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7279765Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064. 2025-12-04T09:59:13.7280091Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7280707Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7281851Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7282189Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7282824Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7283344Z [rank0]:E1204 09:51:30.895000 71551 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.7283756Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7284223Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7285119Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7285572Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7286483Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7286837Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7287685Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7288128Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7289001Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7289441Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7290292Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7290697Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7291550Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7291988Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7293639Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 3. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T09:59:13.7293962Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7294552Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7295698Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7296030Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7296953Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7297505Z [rank3]:E1204 09:51:30.896000 71554 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.7297969Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7298501Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7299506Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7300044Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7301043Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7301440Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7302397Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7302921Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7303886Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7304388Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7305357Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7305802Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7306770Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7307266Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7309322Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.7309646Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7310232Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7311408Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7311736Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7312375Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7312859Z [rank2]:E1204 09:51:30.896000 71553 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.7312960Z dist init r=0, world=4 2025-12-04T09:59:13.7313047Z dist init r=2, world=4 2025-12-04T09:59:13.7313131Z dist init r=3, world=4 2025-12-04T09:59:13.7313224Z dist init r=1, world=4 2025-12-04T09:59:13.7314280Z [rank0]:[W1204 09:51:31.909434978 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.7314369Z FAILED [9.2841s] [ 10%] 2025-12-04T09:59:13.7314381Z 2025-12-04T09:59:13.7314511Z =================================== FAILURES =================================== 2025-12-04T09:59:13.7314916Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda _ 2025-12-04T09:59:13.7315029Z Traceback (most recent call last): 2025-12-04T09:59:13.7315512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.7315641Z self._join_processes(fn) 2025-12-04T09:59:13.7316168Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.7316293Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.7316841Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.7316943Z raise RuntimeError(error) 2025-12-04T09:59:13.7317152Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.7317267Z Traceback (most recent call last): 2025-12-04T09:59:13.7317751Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7317847Z getattr(self, test_name)() 2025-12-04T09:59:13.7318332Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7318409Z fn() 2025-12-04T09:59:13.7318869Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7318963Z method(*args, **kwargs) 2025-12-04T09:59:13.7319410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7319538Z method(*args, **kwargs) 2025-12-04T09:59:13.7319986Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7320073Z with policy(): 2025-12-04T09:59:13.7320530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7320625Z raise RuntimeError(msg) 2025-12-04T09:59:13.7322274Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 1. CUDA driver allocated memory was 602865664 and is now 651100160. 2025-12-04T09:59:13.7322285Z 2025-12-04T09:59:13.7322499Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7323415Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7323421Z 2025-12-04T09:59:13.7323684Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7323690Z 2025-12-04T09:59:13.7323695Z 2025-12-04T09:59:13.7323912Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.7324182Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.7324977Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d13641fc6f0b57c.xml - 2025-12-04T09:59:13.7325613Z =========================== short test summary info ============================ 2025-12-04T09:59:13.7326616Z FAILED [9.2841s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.7326733Z Traceback (most recent call last): 2025-12-04T09:59:13.7327292Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7327399Z getattr(self, test_name)() 2025-12-04T09:59:13.7327949Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7328077Z fn() 2025-12-04T09:59:13.7328583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7328695Z method(*args, **kwargs) 2025-12-04T09:59:13.7329198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7329301Z method(*args, **kwargs) 2025-12-04T09:59:13.7329812Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7329905Z with policy(): 2025-12-04T09:59:13.7330422Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7330529Z raise RuntimeError(msg) 2025-12-04T09:59:13.7331897Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 1. CUDA driver allocated memory was 602865664 and is now 651100160. 2025-12-04T09:59:13.7331915Z 2025-12-04T09:59:13.7332126Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7332970Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7333030Z 2025-12-04T09:59:13.7333304Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7333479Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.7333761Z ======================= 1 failed, 17 deselected in 9.50s ======================= 2025-12-04T09:59:13.7333859Z Got exit code 1 2025-12-04T09:59:13.7333955Z Retrying single test... 2025-12-04T09:59:13.7334551Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-29e66d82c97dbaa5.xml 2025-12-04T09:59:13.7334699Z ============================= test session starts ============================== 2025-12-04T09:59:13.7335026Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.7335135Z cachedir: .pytest_cache 2025-12-04T09:59:13.7335651Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.7335772Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.7335870Z configfile: pytest.ini 2025-12-04T09:59:13.7336450Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.7336831Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.7337762Z stepcurrent: skipping 17 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7337908Z Running 1 items in this shard 2025-12-04T09:59:13.7337914Z 2025-12-04T09:59:13.7339134Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda I1204 09:51:37.414000 71836 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 71888 2025-12-04T09:59:13.7339629Z I1204 09:51:37.415000 71836 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 71889 2025-12-04T09:59:13.7340125Z I1204 09:51:37.416000 71836 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 71890 2025-12-04T09:59:13.7340610Z I1204 09:51:37.417000 71836 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 71891 2025-12-04T09:59:13.7342686Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7342786Z _warn_cpu_init() 2025-12-04T09:59:13.7344820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7344920Z _warn_cpu_init() 2025-12-04T09:59:13.7346954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7347061Z _warn_cpu_init() 2025-12-04T09:59:13.7349164Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7349274Z _warn_cpu_init() 2025-12-04T09:59:13.7350246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.7350399Z return func(*args, **kwargs) 2025-12-04T09:59:13.7350847Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7351365Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7352533Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7352989Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7353908Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7354261Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7355122Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7355559Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7356437Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7357055Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7357961Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7358388Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7359294Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7359764Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7361510Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T09:59:13.7361863Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7362479Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7363703Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7364057Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7364825Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7365348Z [rank2]:E1204 09:51:45.279000 71890 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.7365772Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7366278Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7367227Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7367733Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7368771Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7369122Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7369982Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7370441Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7371296Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7371737Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7372585Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7372986Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7373842Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7374288Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7375926Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 0. CUDA driver allocated memory was 718209024 and is now 760152064. 2025-12-04T09:59:13.7376260Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7377127Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7378466Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7378840Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7379553Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7380110Z [rank0]:E1204 09:51:45.279000 71888 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.7380556Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7381090Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7382126Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7382628Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7383625Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7384062Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7385038Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7385527Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7386485Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7386976Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7387932Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7388384Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7389542Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7389984Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7391590Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 3. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T09:59:13.7391923Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7392533Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7393677Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7394010Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7394644Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7395136Z [rank3]:E1204 09:51:45.279000 71891 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.7395561Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7396038Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7396923Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7397373Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7398280Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7398635Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7399493Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7399923Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7400775Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7401206Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7402061Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7402484Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7403338Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7403775Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7405411Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 1. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T09:59:13.7405741Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7406323Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7407469Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7407803Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7408465Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7408955Z [rank1]:E1204 09:51:45.279000 71889 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.7409043Z dist init r=0, world=4 2025-12-04T09:59:13.7409128Z dist init r=1, world=4 2025-12-04T09:59:13.7409217Z dist init r=2, world=4 2025-12-04T09:59:13.7409300Z dist init r=3, world=4 2025-12-04T09:59:13.7410344Z [rank0]:[W1204 09:51:45.298126885 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.7410459Z FAILED [9.3877s] [100%] 2025-12-04T09:59:13.7410465Z 2025-12-04T09:59:13.7410597Z =================================== FAILURES =================================== 2025-12-04T09:59:13.7411014Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda _ 2025-12-04T09:59:13.7411122Z Traceback (most recent call last): 2025-12-04T09:59:13.7411610Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.7411708Z self._join_processes(fn) 2025-12-04T09:59:13.7412222Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.7412356Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.7412888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.7412993Z raise RuntimeError(error) 2025-12-04T09:59:13.7413208Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.7413314Z Traceback (most recent call last): 2025-12-04T09:59:13.7413801Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7413898Z getattr(self, test_name)() 2025-12-04T09:59:13.7414398Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7414486Z fn() 2025-12-04T09:59:13.7414933Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7415023Z method(*args, **kwargs) 2025-12-04T09:59:13.7415473Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7415567Z method(*args, **kwargs) 2025-12-04T09:59:13.7416019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7416108Z with policy(): 2025-12-04T09:59:13.7416666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7416955Z raise RuntimeError(msg) 2025-12-04T09:59:13.7418333Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T09:59:13.7418340Z 2025-12-04T09:59:13.7418557Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7419405Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7419442Z 2025-12-04T09:59:13.7419709Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7419723Z 2025-12-04T09:59:13.7419727Z 2025-12-04T09:59:13.7419951Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.7420217Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.7421244Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-29e66d82c97dbaa5.xml - 2025-12-04T09:59:13.7421420Z =========================== short test summary info ============================ 2025-12-04T09:59:13.7422435Z FAILED [9.3877s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.7422627Z Traceback (most recent call last): 2025-12-04T09:59:13.7423179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7423297Z getattr(self, test_name)() 2025-12-04T09:59:13.7423843Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7423928Z fn() 2025-12-04T09:59:13.7424447Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7424549Z method(*args, **kwargs) 2025-12-04T09:59:13.7425055Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7425160Z method(*args, **kwargs) 2025-12-04T09:59:13.7425659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7425764Z with policy(): 2025-12-04T09:59:13.7426272Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7426378Z raise RuntimeError(msg) 2025-12-04T09:59:13.7427800Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 2. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T09:59:13.7427807Z 2025-12-04T09:59:13.7428021Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7428870Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7428878Z 2025-12-04T09:59:13.7429139Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7429331Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.7429540Z ======================= 1 failed, 26 deselected in 9.60s ======================= 2025-12-04T09:59:13.7429634Z Got exit code 1 2025-12-04T09:59:13.7429748Z Retrying single test... 2025-12-04T09:59:13.7430369Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a798bbedf3e7b999.xml 2025-12-04T09:59:13.7430535Z ============================= test session starts ============================== 2025-12-04T09:59:13.7430878Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.7430986Z cachedir: .pytest_cache 2025-12-04T09:59:13.7431510Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.7431627Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.7431788Z configfile: pytest.ini 2025-12-04T09:59:13.7432326Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.7432655Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.7433528Z stepcurrent: skipping 17 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7433635Z Running 1 items in this shard 2025-12-04T09:59:13.7433639Z 2025-12-04T09:59:13.7434771Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda I1204 09:51:51.794000 72173 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 72225 2025-12-04T09:59:13.7435279Z I1204 09:51:51.795000 72173 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 72226 2025-12-04T09:59:13.7435741Z I1204 09:51:51.796000 72173 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 72227 2025-12-04T09:59:13.7436209Z I1204 09:51:51.796000 72173 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 72228 2025-12-04T09:59:13.7438106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7438206Z _warn_cpu_init() 2025-12-04T09:59:13.7440116Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7440216Z _warn_cpu_init() 2025-12-04T09:59:13.7442105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7442199Z _warn_cpu_init() 2025-12-04T09:59:13.7444189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7444277Z _warn_cpu_init() 2025-12-04T09:59:13.7445167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.7445264Z return func(*args, **kwargs) 2025-12-04T09:59:13.7445682Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7446179Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7447070Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7447533Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7448408Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7448796Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7449643Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7450084Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7450937Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7451366Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7452220Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7452621Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7453506Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7453946Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7455570Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T09:59:13.7455900Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7456599Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7458069Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7458437Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7459163Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7459711Z [rank0]:E1204 09:51:59.450000 72225 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.7460214Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7460745Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7461750Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7462260Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7463273Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7463682Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7464647Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7465140Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7466096Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7466582Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7467548Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7468020Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7469098Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7469559Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7471315Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.7471657Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7472276Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7473506Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7473845Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7474551Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7475179Z [rank1]:E1204 09:51:59.450000 72226 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.7475592Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7476060Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7476950Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7477442Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7478319Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7478681Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7479534Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7479971Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7480821Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7481253Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7482135Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7482532Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7483390Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7483826Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7485495Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T09:59:13.7485820Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7486405Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7487558Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7487917Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7488562Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7489044Z [rank2]:E1204 09:51:59.451000 72227 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.7489452Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7489945Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7490829Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7491292Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7492160Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7492517Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7493372Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7493814Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7494691Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7495121Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7495974Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7496435Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7497546Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7498082Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7499913Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 3. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T09:59:13.7500278Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7500942Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7502273Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7502636Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7503355Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7503927Z [rank3]:E1204 09:51:59.452000 72228 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.7504034Z dist init r=1, world=4 2025-12-04T09:59:13.7504135Z dist init r=2, world=4 2025-12-04T09:59:13.7504230Z dist init r=0, world=4 2025-12-04T09:59:13.7504336Z dist init r=3, world=4 2025-12-04T09:59:13.7505496Z [rank0]:[W1204 09:51:59.469554411 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.7505598Z FAILED [10.2161s] [100%] 2025-12-04T09:59:13.7505609Z 2025-12-04T09:59:13.7505757Z =================================== FAILURES =================================== 2025-12-04T09:59:13.7506212Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda _ 2025-12-04T09:59:13.7506340Z Traceback (most recent call last): 2025-12-04T09:59:13.7506883Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.7506996Z self._join_processes(fn) 2025-12-04T09:59:13.7507589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.7507732Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.7508379Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.7508492Z raise RuntimeError(error) 2025-12-04T09:59:13.7508839Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.7508969Z Traceback (most recent call last): 2025-12-04T09:59:13.7509572Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7509673Z getattr(self, test_name)() 2025-12-04T09:59:13.7510160Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7510241Z fn() 2025-12-04T09:59:13.7510695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7510818Z method(*args, **kwargs) 2025-12-04T09:59:13.7511266Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7511366Z method(*args, **kwargs) 2025-12-04T09:59:13.7511808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7511894Z with policy(): 2025-12-04T09:59:13.7512355Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7512454Z raise RuntimeError(msg) 2025-12-04T09:59:13.7513682Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T09:59:13.7513716Z 2025-12-04T09:59:13.7513906Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7514670Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7514675Z 2025-12-04T09:59:13.7514912Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7514917Z 2025-12-04T09:59:13.7515060Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.7515204Z Traceback (most recent call last): 2025-12-04T09:59:13.7515691Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7515804Z getattr(self, test_name)() 2025-12-04T09:59:13.7516279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7516360Z fn() 2025-12-04T09:59:13.7516821Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7516915Z method(*args, **kwargs) 2025-12-04T09:59:13.7517363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7517460Z method(*args, **kwargs) 2025-12-04T09:59:13.7517905Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7518001Z with policy(): 2025-12-04T09:59:13.7518449Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7518544Z raise RuntimeError(msg) 2025-12-04T09:59:13.7519801Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.7519807Z 2025-12-04T09:59:13.7519999Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7520886Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7520897Z 2025-12-04T09:59:13.7521144Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7521148Z 2025-12-04T09:59:13.7521485Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.7521619Z Traceback (most recent call last): 2025-12-04T09:59:13.7522182Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7522361Z getattr(self, test_name)() 2025-12-04T09:59:13.7522901Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7522989Z fn() 2025-12-04T09:59:13.7523503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7523608Z method(*args, **kwargs) 2025-12-04T09:59:13.7524118Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7524223Z method(*args, **kwargs) 2025-12-04T09:59:13.7524720Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7524864Z with policy(): 2025-12-04T09:59:13.7525368Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7525481Z raise RuntimeError(msg) 2025-12-04T09:59:13.7526869Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T09:59:13.7526875Z 2025-12-04T09:59:13.7527085Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7527966Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7527972Z 2025-12-04T09:59:13.7528234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7528242Z 2025-12-04T09:59:13.7528246Z 2025-12-04T09:59:13.7528480Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.7528745Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.7529547Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a798bbedf3e7b999.xml - 2025-12-04T09:59:13.7529729Z =========================== short test summary info ============================ 2025-12-04T09:59:13.7530731Z FAILED [10.2161s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.7530865Z Traceback (most recent call last): 2025-12-04T09:59:13.7531412Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7531526Z getattr(self, test_name)() 2025-12-04T09:59:13.7532070Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7532206Z fn() 2025-12-04T09:59:13.7532721Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7532829Z method(*args, **kwargs) 2025-12-04T09:59:13.7533334Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7533440Z method(*args, **kwargs) 2025-12-04T09:59:13.7534011Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7534099Z with policy(): 2025-12-04T09:59:13.7534556Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7534657Z raise RuntimeError(msg) 2025-12-04T09:59:13.7535916Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 49664 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T09:59:13.7535921Z 2025-12-04T09:59:13.7536109Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7537113Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7537133Z 2025-12-04T09:59:13.7537399Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7537442Z 2025-12-04T09:59:13.7537606Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.7537733Z Traceback (most recent call last): 2025-12-04T09:59:13.7538284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7538395Z getattr(self, test_name)() 2025-12-04T09:59:13.7538939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7539029Z fn() 2025-12-04T09:59:13.7539542Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7539645Z method(*args, **kwargs) 2025-12-04T09:59:13.7540177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7540288Z method(*args, **kwargs) 2025-12-04T09:59:13.7540789Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7540887Z with policy(): 2025-12-04T09:59:13.7541403Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7541514Z raise RuntimeError(msg) 2025-12-04T09:59:13.7542893Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 51712 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.7542899Z 2025-12-04T09:59:13.7543116Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7543963Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7543971Z 2025-12-04T09:59:13.7544236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7544241Z 2025-12-04T09:59:13.7544405Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.7544563Z Traceback (most recent call last): 2025-12-04T09:59:13.7545109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7545226Z getattr(self, test_name)() 2025-12-04T09:59:13.7545759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7545846Z fn() 2025-12-04T09:59:13.7546365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7546467Z method(*args, **kwargs) 2025-12-04T09:59:13.7546970Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7547082Z method(*args, **kwargs) 2025-12-04T09:59:13.7547615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7547720Z with policy(): 2025-12-04T09:59:13.7548229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7548339Z raise RuntimeError(msg) 2025-12-04T09:59:13.7549745Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda! Caching allocator allocated memory was 512 and is now reported as 53760 on device 2. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T09:59:13.7549754Z 2025-12-04T09:59:13.7549949Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7550731Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7550736Z 2025-12-04T09:59:13.7550969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7551130Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.7551301Z ====================== 1 failed, 26 deselected in 10.44s ======================= 2025-12-04T09:59:13.7551384Z Got exit code 1 2025-12-04T09:59:13.7552066Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda 2025-12-04T09:59:13.7552454Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.7553001Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e0d5d8a174cb3c98.xml 2025-12-04T09:59:13.7553153Z ============================= test session starts ============================== 2025-12-04T09:59:13.7553467Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.7553575Z cachedir: .pytest_cache 2025-12-04T09:59:13.7554027Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.7554134Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.7554236Z configfile: pytest.ini 2025-12-04T09:59:13.7554713Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.7554905Z collecting ... collected 60 items / 18 deselected / 42 selected 2025-12-04T09:59:13.7555032Z stepcurrent: skipping 18 already run items. 2025-12-04T09:59:13.7555133Z Running 9 items in this shard 2025-12-04T09:59:13.7555137Z 2025-12-04T09:59:13.7556457Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda I1204 09:52:06.254000 72510 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 72562 2025-12-04T09:59:13.7556930Z I1204 09:52:06.255000 72510 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 72563 2025-12-04T09:59:13.7557397Z I1204 09:52:06.255000 72510 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 72564 2025-12-04T09:59:13.7557862Z I1204 09:52:06.256000 72510 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 72565 2025-12-04T09:59:13.7558807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7558947Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7559903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7560037Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7561935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7562033Z _warn_cpu_init() 2025-12-04T09:59:13.7563972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7564064Z _warn_cpu_init() 2025-12-04T09:59:13.7565004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7565152Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7567046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7567139Z _warn_cpu_init() 2025-12-04T09:59:13.7568084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7568293Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7569231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7569549Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7570525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7570649Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7572430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7572522Z _warn_cpu_init() 2025-12-04T09:59:13.7573406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7573627Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7574520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7574711Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7575600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.7575701Z return func(*args, **kwargs) 2025-12-04T09:59:13.7576477Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7576617Z return func(*args, **kwargs) 2025-12-04T09:59:13.7577553Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7577672Z return func(*args, **kwargs) 2025-12-04T09:59:13.7578434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7578538Z return func(*args, **kwargs) 2025-12-04T09:59:13.7579357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7579464Z return func(*args, **kwargs) 2025-12-04T09:59:13.7580234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7580342Z return func(*args, **kwargs) 2025-12-04T09:59:13.7581096Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7581214Z return func(*args, **kwargs) 2025-12-04T09:59:13.7581963Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7582071Z return func(*args, **kwargs) 2025-12-04T09:59:13.7582837Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7582944Z return func(*args, **kwargs) 2025-12-04T09:59:13.7583413Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7583976Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7584985Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7585499Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7586489Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7586897Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7587893Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7588391Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7589409Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7589849Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7590757Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7591160Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7592022Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7592457Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7594133Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T09:59:13.7594465Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7595062Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7596219Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7596545Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7597194Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7597708Z [rank3]:E1204 09:52:13.900000 72565 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.7598118Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7598588Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7599478Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7599939Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7600838Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7601209Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7602058Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7602498Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7603343Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7603803Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7604658Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7605049Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7605914Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7606374Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7608017Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 65024 on device 0. CUDA driver allocated memory was 720306176 and is now 737083392. 2025-12-04T09:59:13.7608339Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7608927Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7610084Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7610407Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7611081Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7611563Z [rank0]:E1204 09:52:13.900000 72562 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.7611972Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7612446Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7613339Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7613818Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7614700Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7615054Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7615904Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7616441Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7617552Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7618054Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7619009Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7619494Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7620474Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7621166Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7623007Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 1. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:59:13.7623368Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7624035Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7625413Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7625778Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7626504Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7627051Z [rank1]:E1204 09:52:13.900000 72563 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.7627511Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7628039Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7629090Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7629597Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7630585Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7630994Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7632009Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7632619Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7633608Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7634053Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7634950Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7635345Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7636209Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7636644Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7638265Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 65024 on device 2. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T09:59:13.7638592Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7639212Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7640375Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7640700Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7641342Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7641827Z [rank2]:E1204 09:52:13.901000 72564 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.7641931Z dist init r=2, world=4 2025-12-04T09:59:13.7642044Z dist init r=3, world=4 2025-12-04T09:59:13.7642132Z dist init r=1, world=4 2025-12-04T09:59:13.7642220Z dist init r=0, world=4 2025-12-04T09:59:13.7643245Z [rank0]:[W1204 09:52:14.005460998 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.7643340Z FAILED [10.0260s] [ 11%] 2025-12-04T09:59:13.7643348Z 2025-12-04T09:59:13.7643475Z =================================== FAILURES =================================== 2025-12-04T09:59:13.7643893Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda _ 2025-12-04T09:59:13.7644037Z Traceback (most recent call last): 2025-12-04T09:59:13.7644523Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.7644629Z self._join_processes(fn) 2025-12-04T09:59:13.7645148Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.7645449Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.7646029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.7646135Z raise RuntimeError(error) 2025-12-04T09:59:13.7646391Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.7646513Z Traceback (most recent call last): 2025-12-04T09:59:13.7647020Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7647134Z getattr(self, test_name)() 2025-12-04T09:59:13.7647641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7647727Z fn() 2025-12-04T09:59:13.7648214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7648313Z method(*args, **kwargs) 2025-12-04T09:59:13.7648788Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7648889Z method(*args, **kwargs) 2025-12-04T09:59:13.7649363Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7649461Z with policy(): 2025-12-04T09:59:13.7649939Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7650041Z raise RuntimeError(msg) 2025-12-04T09:59:13.7651376Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 1. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:59:13.7651383Z 2025-12-04T09:59:13.7651588Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7652394Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7652402Z 2025-12-04T09:59:13.7652651Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7652656Z 2025-12-04T09:59:13.7652815Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.7652931Z Traceback (most recent call last): 2025-12-04T09:59:13.7653476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7653586Z getattr(self, test_name)() 2025-12-04T09:59:13.7654093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7654177Z fn() 2025-12-04T09:59:13.7654662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7654763Z method(*args, **kwargs) 2025-12-04T09:59:13.7655240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7655339Z method(*args, **kwargs) 2025-12-04T09:59:13.7655808Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7655933Z with policy(): 2025-12-04T09:59:13.7656493Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7656598Z raise RuntimeError(msg) 2025-12-04T09:59:13.7658151Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 65024 on device 2. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T09:59:13.7658158Z 2025-12-04T09:59:13.7658375Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7659277Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7659285Z 2025-12-04T09:59:13.7659549Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7659555Z 2025-12-04T09:59:13.7659560Z 2025-12-04T09:59:13.7659787Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.7660047Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.7660848Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e0d5d8a174cb3c98.xml - 2025-12-04T09:59:13.7661022Z =========================== short test summary info ============================ 2025-12-04T09:59:13.7662032Z FAILED [10.0260s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.7662164Z Traceback (most recent call last): 2025-12-04T09:59:13.7662712Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7662824Z getattr(self, test_name)() 2025-12-04T09:59:13.7663396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7663485Z fn() 2025-12-04T09:59:13.7663998Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7664099Z method(*args, **kwargs) 2025-12-04T09:59:13.7664607Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7664725Z method(*args, **kwargs) 2025-12-04T09:59:13.7665225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7665319Z with policy(): 2025-12-04T09:59:13.7665833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7666292Z raise RuntimeError(msg) 2025-12-04T09:59:13.7667678Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 1. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:59:13.7667684Z 2025-12-04T09:59:13.7667896Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7668845Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7668862Z 2025-12-04T09:59:13.7669096Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7669133Z 2025-12-04T09:59:13.7669277Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.7669396Z Traceback (most recent call last): 2025-12-04T09:59:13.7669880Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7669974Z getattr(self, test_name)() 2025-12-04T09:59:13.7670453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7670531Z fn() 2025-12-04T09:59:13.7670985Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7671102Z method(*args, **kwargs) 2025-12-04T09:59:13.7671551Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7671648Z method(*args, **kwargs) 2025-12-04T09:59:13.7672093Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7672180Z with policy(): 2025-12-04T09:59:13.7672640Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7672736Z raise RuntimeError(msg) 2025-12-04T09:59:13.7673961Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 65024 on device 2. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T09:59:13.7673968Z 2025-12-04T09:59:13.7674159Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7674923Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7674930Z 2025-12-04T09:59:13.7675168Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7675355Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.7675521Z ====================== 1 failed, 18 deselected in 10.24s ======================= 2025-12-04T09:59:13.7675608Z Got exit code 1 2025-12-04T09:59:13.7675700Z Retrying single test... 2025-12-04T09:59:13.7676259Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-931d013fb4c2579a.xml 2025-12-04T09:59:13.7676404Z ============================= test session starts ============================== 2025-12-04T09:59:13.7676717Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.7676810Z cachedir: .pytest_cache 2025-12-04T09:59:13.7677264Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.7677378Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.7677509Z configfile: pytest.ini 2025-12-04T09:59:13.7677991Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.7678185Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.7679023Z stepcurrent: skipping 18 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7679132Z Running 1 items in this shard 2025-12-04T09:59:13.7679136Z 2025-12-04T09:59:13.7680447Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda I1204 09:52:20.724000 72847 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 72899 2025-12-04T09:59:13.7680962Z I1204 09:52:20.725000 72847 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 72900 2025-12-04T09:59:13.7681426Z I1204 09:52:20.725000 72847 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 72901 2025-12-04T09:59:13.7681889Z I1204 09:52:20.726000 72847 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 72902 2025-12-04T09:59:13.7682831Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7682996Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7684916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7685011Z _warn_cpu_init() 2025-12-04T09:59:13.7685954Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7686081Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7687977Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7688084Z _warn_cpu_init() 2025-12-04T09:59:13.7689045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7689178Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7690110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7690331Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7692358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7692462Z _warn_cpu_init() 2025-12-04T09:59:13.7693341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7693542Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7694435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.7694563Z return func(*args, **kwargs) 2025-12-04T09:59:13.7695446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7695566Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7697712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7697864Z _warn_cpu_init() 2025-12-04T09:59:13.7698861Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7699095Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7700090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7700320Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7701093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7701207Z return func(*args, **kwargs) 2025-12-04T09:59:13.7701981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7702093Z return func(*args, **kwargs) 2025-12-04T09:59:13.7702884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7703004Z return func(*args, **kwargs) 2025-12-04T09:59:13.7703766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7703881Z return func(*args, **kwargs) 2025-12-04T09:59:13.7704640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7704751Z return func(*args, **kwargs) 2025-12-04T09:59:13.7705548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7705657Z return func(*args, **kwargs) 2025-12-04T09:59:13.7706422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7706530Z return func(*args, **kwargs) 2025-12-04T09:59:13.7707289Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7707408Z return func(*args, **kwargs) 2025-12-04T09:59:13.7707868Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7708416Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7709507Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7709963Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7710851Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7711236Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7712095Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7712531Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7713391Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7713822Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7714684Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7715092Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7715971Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7716417Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7718038Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 1. CUDA driver allocated memory was 611254272 and is now 628031488. 2025-12-04T09:59:13.7718376Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7718988Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7720164Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7720488Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7721475Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7722098Z [rank1]:E1204 09:52:28.463000 72900 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.7722620Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7723163Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7724164Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7724671Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7725707Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7726104Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7727079Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7727568Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7728531Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7729022Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7729980Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7730485Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7731459Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7731957Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7733922Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 60928 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392. 2025-12-04T09:59:13.7734262Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7734845Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7736009Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7736402Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7737299Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7737867Z [rank0]:E1204 09:52:28.463000 72899 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.7738321Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7738866Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7739868Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7740424Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7741419Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7741820Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7742792Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7743283Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7744248Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7744738Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7745724Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7746178Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7747137Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7747640Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7749543Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:59:13.7749879Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7750459Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7751627Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7751980Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7752617Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7753110Z [rank2]:E1204 09:52:28.465000 72901 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.7753504Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7754008Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7755089Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7755582Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7756509Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7756884Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7757795Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7758252Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7759196Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7759653Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7760563Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7760986Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7761889Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7762391Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7764325Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T09:59:13.7764694Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7765328Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7766638Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7766991Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7767684Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7768248Z [rank3]:E1204 09:52:28.465000 72902 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.7768344Z dist init r=1, world=4 2025-12-04T09:59:13.7768452Z dist init r=0, world=4 2025-12-04T09:59:13.7768543Z dist init r=2, world=4 2025-12-04T09:59:13.7768638Z dist init r=3, world=4 2025-12-04T09:59:13.7769770Z [rank0]:[W1204 09:52:28.482492998 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.7769868Z FAILED [10.2471s] [100%] 2025-12-04T09:59:13.7769874Z 2025-12-04T09:59:13.7770025Z =================================== FAILURES =================================== 2025-12-04T09:59:13.7770479Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda _ 2025-12-04T09:59:13.7770599Z Traceback (most recent call last): 2025-12-04T09:59:13.7771140Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.7771251Z self._join_processes(fn) 2025-12-04T09:59:13.7771813Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.7771959Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.7772587Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.7772705Z raise RuntimeError(error) 2025-12-04T09:59:13.7772933Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.7773047Z Traceback (most recent call last): 2025-12-04T09:59:13.7773585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7773692Z getattr(self, test_name)() 2025-12-04T09:59:13.7774210Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7774305Z fn() 2025-12-04T09:59:13.7774796Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7774937Z method(*args, **kwargs) 2025-12-04T09:59:13.7775431Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7775530Z method(*args, **kwargs) 2025-12-04T09:59:13.7776029Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7776122Z with policy(): 2025-12-04T09:59:13.7776896Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7777015Z raise RuntimeError(msg) 2025-12-04T09:59:13.7778396Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 60928 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392. 2025-12-04T09:59:13.7778445Z 2025-12-04T09:59:13.7778671Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7779522Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7779528Z 2025-12-04T09:59:13.7779803Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7779808Z 2025-12-04T09:59:13.7780081Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.7780200Z Traceback (most recent call last): 2025-12-04T09:59:13.7780756Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7780870Z getattr(self, test_name)() 2025-12-04T09:59:13.7781417Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7781507Z fn() 2025-12-04T09:59:13.7782018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7782129Z method(*args, **kwargs) 2025-12-04T09:59:13.7782630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7782733Z method(*args, **kwargs) 2025-12-04T09:59:13.7783242Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7783343Z with policy(): 2025-12-04T09:59:13.7783858Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7783966Z raise RuntimeError(msg) 2025-12-04T09:59:13.7785376Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 1. CUDA driver allocated memory was 611254272 and is now 628031488. 2025-12-04T09:59:13.7785391Z 2025-12-04T09:59:13.7785605Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7786454Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7786462Z 2025-12-04T09:59:13.7786737Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7786742Z 2025-12-04T09:59:13.7786905Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.7787038Z Traceback (most recent call last): 2025-12-04T09:59:13.7787592Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7787730Z getattr(self, test_name)() 2025-12-04T09:59:13.7788424Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7788511Z fn() 2025-12-04T09:59:13.7789003Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7789116Z method(*args, **kwargs) 2025-12-04T09:59:13.7789607Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7789720Z method(*args, **kwargs) 2025-12-04T09:59:13.7790209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7790335Z with policy(): 2025-12-04T09:59:13.7790836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7790940Z raise RuntimeError(msg) 2025-12-04T09:59:13.7792375Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:59:13.7792391Z 2025-12-04T09:59:13.7792595Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7793425Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7793433Z 2025-12-04T09:59:13.7793689Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7793694Z 2025-12-04T09:59:13.7793699Z 2025-12-04T09:59:13.7793909Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.7794163Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.7794916Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-931d013fb4c2579a.xml - 2025-12-04T09:59:13.7795077Z =========================== short test summary info ============================ 2025-12-04T09:59:13.7796046Z FAILED [10.2471s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.7796161Z Traceback (most recent call last): 2025-12-04T09:59:13.7796685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7796792Z getattr(self, test_name)() 2025-12-04T09:59:13.7797327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7797422Z fn() 2025-12-04T09:59:13.7797994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7798098Z method(*args, **kwargs) 2025-12-04T09:59:13.7798547Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7798643Z method(*args, **kwargs) 2025-12-04T09:59:13.7799099Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7799184Z with policy(): 2025-12-04T09:59:13.7799637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7799748Z raise RuntimeError(msg) 2025-12-04T09:59:13.7801002Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 60928 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392. 2025-12-04T09:59:13.7801008Z 2025-12-04T09:59:13.7801206Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7801963Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7801970Z 2025-12-04T09:59:13.7802220Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7802249Z 2025-12-04T09:59:13.7802390Z Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.7802500Z Traceback (most recent call last): 2025-12-04T09:59:13.7802994Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7803089Z getattr(self, test_name)() 2025-12-04T09:59:13.7803563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7803648Z fn() 2025-12-04T09:59:13.7804097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7804224Z method(*args, **kwargs) 2025-12-04T09:59:13.7804668Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7804759Z method(*args, **kwargs) 2025-12-04T09:59:13.7805215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7805301Z with policy(): 2025-12-04T09:59:13.7805761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7805856Z raise RuntimeError(msg) 2025-12-04T09:59:13.7807075Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 1. CUDA driver allocated memory was 611254272 and is now 628031488. 2025-12-04T09:59:13.7807081Z 2025-12-04T09:59:13.7807280Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7808035Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7808041Z 2025-12-04T09:59:13.7808287Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7808293Z 2025-12-04T09:59:13.7808464Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.7808574Z Traceback (most recent call last): 2025-12-04T09:59:13.7809064Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7809162Z getattr(self, test_name)() 2025-12-04T09:59:13.7809645Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7809726Z fn() 2025-12-04T09:59:13.7810172Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7810271Z method(*args, **kwargs) 2025-12-04T09:59:13.7810719Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7810809Z method(*args, **kwargs) 2025-12-04T09:59:13.7811307Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7811395Z with policy(): 2025-12-04T09:59:13.7811853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7811949Z raise RuntimeError(msg) 2025-12-04T09:59:13.7813168Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 62976 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:59:13.7813184Z 2025-12-04T09:59:13.7813402Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7814160Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7814165Z 2025-12-04T09:59:13.7814408Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7814568Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.7814728Z ====================== 1 failed, 26 deselected in 10.46s ======================= 2025-12-04T09:59:13.7814821Z Got exit code 1 2025-12-04T09:59:13.7814915Z Retrying single test... 2025-12-04T09:59:13.7815500Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-92646f491493cae0.xml 2025-12-04T09:59:13.7815644Z ============================= test session starts ============================== 2025-12-04T09:59:13.7815956Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.7816056Z cachedir: .pytest_cache 2025-12-04T09:59:13.7816596Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.7816894Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.7817004Z configfile: pytest.ini 2025-12-04T09:59:13.7817538Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.7817767Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.7818705Z stepcurrent: skipping 18 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7818825Z Running 1 items in this shard 2025-12-04T09:59:13.7818832Z 2025-12-04T09:59:13.7820068Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda I1204 09:52:35.164000 73184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 73236 2025-12-04T09:59:13.7820602Z I1204 09:52:35.164000 73184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 73237 2025-12-04T09:59:13.7821341Z I1204 09:52:35.165000 73184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 73238 2025-12-04T09:59:13.7821839Z I1204 09:52:35.166000 73184 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 73239 2025-12-04T09:59:13.7822857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7822997Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7825084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7825199Z _warn_cpu_init() 2025-12-04T09:59:13.7826197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7826427Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7827470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7827613Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7828598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7828727Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7830760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7830904Z _warn_cpu_init() 2025-12-04T09:59:13.7833037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7833138Z _warn_cpu_init() 2025-12-04T09:59:13.7834085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.7834190Z return func(*args, **kwargs) 2025-12-04T09:59:13.7835123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7835373Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7836314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7836528Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7837633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:426: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7837765Z return FSDP(layer, group, **fsdp_kwargs) 2025-12-04T09:59:13.7839766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7839863Z _warn_cpu_init() 2025-12-04T09:59:13.7840834Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:1464: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7841047Z fsdp_model = FSDP(fsdp_model, self.process_group, **fsdp_kwargs) 2025-12-04T09:59:13.7844252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7844445Z return func(*args, **kwargs) 2025-12-04T09:59:13.7845204Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7845319Z return func(*args, **kwargs) 2025-12-04T09:59:13.7846062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7846165Z return func(*args, **kwargs) 2025-12-04T09:59:13.7846914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7847052Z return func(*args, **kwargs) 2025-12-04T09:59:13.7847806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7847921Z return func(*args, **kwargs) 2025-12-04T09:59:13.7848769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7848884Z return func(*args, **kwargs) 2025-12-04T09:59:13.7849594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7849695Z return func(*args, **kwargs) 2025-12-04T09:59:13.7850489Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7850586Z return func(*args, **kwargs) 2025-12-04T09:59:13.7851014Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7851485Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7852407Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7852868Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7853747Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7854113Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7854962Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7855412Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7856263Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7857164Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7858201Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7858678Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7859653Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7860142Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7861995Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 58880 on device 0. CUDA driver allocated memory was 711917568 and is now 737083392. 2025-12-04T09:59:13.7862396Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7863054Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7864370Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7864739Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7865471Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7866019Z [rank0]:E1204 09:52:42.930000 73236 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.7866508Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7867039Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7868039Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7868551Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7869620Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7870007Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7870910Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7871375Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7872284Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7872817Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7873797Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7874192Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7875053Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7875520Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7877166Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 58880 on device 3. CUDA driver allocated memory was 558825472 and is now 628031488. 2025-12-04T09:59:13.7877488Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7878067Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7879239Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7879567Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7880256Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7880741Z [rank3]:E1204 09:52:42.931000 73239 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.7881153Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7881622Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7882508Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7882972Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7883842Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7884202Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7885051Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7885498Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7886398Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7886833Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7887688Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7888111Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7888974Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7889412Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7891041Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 60928 on device 1. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T09:59:13.7891363Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7891957Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7893143Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7893466Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7894115Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7894596Z [rank1]:E1204 09:52:42.932000 73237 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.7895009Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7895484Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7896443Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7897105Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7898090Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7898498Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7899501Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7900026Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7900981Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7901465Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7902465Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7902912Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7903887Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7904375Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7906206Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 60928 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:59:13.7906574Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7907242Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7908571Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7909040Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7909806Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7910289Z [rank2]:E1204 09:52:42.932000 73238 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.7910390Z dist init r=0, world=4 2025-12-04T09:59:13.7910475Z dist init r=1, world=4 2025-12-04T09:59:13.7910557Z dist init r=2, world=4 2025-12-04T09:59:13.7910650Z dist init r=3, world=4 2025-12-04T09:59:13.7911675Z [rank0]:[W1204 09:52:43.936846942 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.7911770Z FAILED [9.4939s] [100%] 2025-12-04T09:59:13.7911776Z 2025-12-04T09:59:13.7911903Z =================================== FAILURES =================================== 2025-12-04T09:59:13.7912321Z _ TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda _ 2025-12-04T09:59:13.7912433Z Traceback (most recent call last): 2025-12-04T09:59:13.7912977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.7913080Z self._join_processes(fn) 2025-12-04T09:59:13.7913607Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.7913733Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.7914276Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.7914377Z raise RuntimeError(error) 2025-12-04T09:59:13.7914583Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.7914720Z Traceback (most recent call last): 2025-12-04T09:59:13.7915200Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7915301Z getattr(self, test_name)() 2025-12-04T09:59:13.7915783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7915861Z fn() 2025-12-04T09:59:13.7916316Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7916409Z method(*args, **kwargs) 2025-12-04T09:59:13.7916853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7916951Z method(*args, **kwargs) 2025-12-04T09:59:13.7917393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7917486Z with policy(): 2025-12-04T09:59:13.7917932Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7918028Z raise RuntimeError(msg) 2025-12-04T09:59:13.7919297Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 58880 on device 0. CUDA driver allocated memory was 711917568 and is now 737083392. 2025-12-04T09:59:13.7919304Z 2025-12-04T09:59:13.7919493Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7920260Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7920268Z 2025-12-04T09:59:13.7920499Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7920504Z 2025-12-04T09:59:13.7920508Z 2025-12-04T09:59:13.7920700Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.7921277Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.7922244Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-92646f491493cae0.xml - 2025-12-04T09:59:13.7922430Z =========================== short test summary info ============================ 2025-12-04T09:59:13.7923441Z FAILED [9.4939s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.7923560Z Traceback (most recent call last): 2025-12-04T09:59:13.7924115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7924223Z getattr(self, test_name)() 2025-12-04T09:59:13.7924845Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7924971Z fn() 2025-12-04T09:59:13.7925479Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7925593Z method(*args, **kwargs) 2025-12-04T09:59:13.7926096Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7926215Z method(*args, **kwargs) 2025-12-04T09:59:13.7926715Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7926815Z with policy(): 2025-12-04T09:59:13.7927384Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7927489Z raise RuntimeError(msg) 2025-12-04T09:59:13.7928880Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 58880 on device 0. CUDA driver allocated memory was 711917568 and is now 737083392. 2025-12-04T09:59:13.7928899Z 2025-12-04T09:59:13.7929109Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7929967Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7929972Z 2025-12-04T09:59:13.7930244Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.7930423Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.7930610Z ======================= 1 failed, 26 deselected in 9.71s ======================= 2025-12-04T09:59:13.7930708Z Got exit code 1 2025-12-04T09:59:13.7931475Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda 2025-12-04T09:59:13.7931924Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.7932543Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8232c23afc6466e0.xml 2025-12-04T09:59:13.7932716Z ============================= test session starts ============================== 2025-12-04T09:59:13.7933064Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.7933173Z cachedir: .pytest_cache 2025-12-04T09:59:13.7933802Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.7934033Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.7934128Z configfile: pytest.ini 2025-12-04T09:59:13.7934612Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.7934804Z collecting ... collected 60 items / 19 deselected / 41 selected 2025-12-04T09:59:13.7934935Z stepcurrent: skipping 19 already run items. 2025-12-04T09:59:13.7935032Z Running 8 items in this shard 2025-12-04T09:59:13.7935037Z 2025-12-04T09:59:13.7935944Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda I1204 09:52:49.684000 73521 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 73573 2025-12-04T09:59:13.7936468Z I1204 09:52:49.685000 73521 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 73574 2025-12-04T09:59:13.7937103Z I1204 09:52:49.685000 73521 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 73575 2025-12-04T09:59:13.7937679Z I1204 09:52:49.686000 73521 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 73576 2025-12-04T09:59:13.7938932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.7939061Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.7940304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.7940459Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.7941695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.7941817Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.7943055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.7943175Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.7944138Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7944261Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.7946308Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7946415Z _warn_cpu_init() 2025-12-04T09:59:13.7947377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7947491Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.7948460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7948577Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.7950531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7950622Z _warn_cpu_init() 2025-12-04T09:59:13.7952428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7952543Z _warn_cpu_init() 2025-12-04T09:59:13.7953434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7953524Z fsdp_model = FSDP( 2025-12-04T09:59:13.7954370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7954477Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.7956290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.7956384Z _warn_cpu_init() 2025-12-04T09:59:13.7957275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7957370Z fsdp_model = FSDP( 2025-12-04T09:59:13.7958252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7958349Z fsdp_model = FSDP( 2025-12-04T09:59:13.7959226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.7959316Z fsdp_model = FSDP( 2025-12-04T09:59:13.7960025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7960130Z return func(*args, **kwargs) 2025-12-04T09:59:13.7960809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7960919Z return func(*args, **kwargs) 2025-12-04T09:59:13.7961593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7961690Z return func(*args, **kwargs) 2025-12-04T09:59:13.7962378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.7962473Z return func(*args, **kwargs) 2025-12-04T09:59:13.7963147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7963240Z return func(*args, **kwargs) 2025-12-04T09:59:13.7963909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7964010Z return func(*args, **kwargs) 2025-12-04T09:59:13.7964911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7965050Z return func(*args, **kwargs) 2025-12-04T09:59:13.7965788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.7965889Z return func(*args, **kwargs) 2025-12-04T09:59:13.7966833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.7966936Z return func(*args, **kwargs) 2025-12-04T09:59:13.7971189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.7971591Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.7975854Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.7976236Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.7980974Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.7981376Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.7985975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.7986403Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.7986861Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.7987408Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.7988415Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.7989042Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.7990038Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.7990400Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.7991277Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7991711Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7992713Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.7993189Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.7994124Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.7994561Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.7995511Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.7996088Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.7997638Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 2. CUDA driver allocated memory was 607059968 and is now 678363136. 2025-12-04T09:59:13.7998030Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.7998698Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.7999728Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8000141Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8000856Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8001420Z [rank2]:E1204 09:53:00.487000 73575 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.8007121Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8007658Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8008576Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8009043Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8010008Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8010368Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8011218Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8011654Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8012519Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8012953Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8013810Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8014206Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8015066Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8015502Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8017420Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 0. CUDA driver allocated memory was 720306176 and is now 787415040. 2025-12-04T09:59:13.8017792Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8018446Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8019918Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8020289Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8021235Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8021784Z [rank0]:E1204 09:53:00.487000 73573 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.8022235Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8022771Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8023778Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8024296Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8025363Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8025770Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8026730Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8027216Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8028189Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8028672Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8029641Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8030088Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8031112Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8031646Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8033369Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 3. CUDA driver allocated memory was 604962816 and is now 678363136. 2025-12-04T09:59:13.8033693Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8034314Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8035305Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8035628Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8036274Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8036760Z [rank3]:E1204 09:53:00.488000 73576 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.8037159Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8037634Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8038554Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8039012Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8039894Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8040253Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8041108Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8041541Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8042394Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8042824Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8043680Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8044077Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8044994Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8045431Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8046898Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 1. CUDA driver allocated memory was 607059968 and is now 678363136. 2025-12-04T09:59:13.8047255Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8047842Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8048832Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8049152Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8049794Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8050278Z [rank1]:E1204 09:53:00.488000 73574 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.8050372Z dist init r=3, world=4 2025-12-04T09:59:13.8050470Z dist init r=0, world=4 2025-12-04T09:59:13.8050558Z dist init r=1, world=4 2025-12-04T09:59:13.8050644Z dist init r=2, world=4 2025-12-04T09:59:13.8051709Z [rank0]:[W1204 09:53:00.522164021 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.8051800Z FAILED [13.1067s] [ 12%] 2025-12-04T09:59:13.8051806Z 2025-12-04T09:59:13.8051946Z =================================== FAILURES =================================== 2025-12-04T09:59:13.8052215Z ______ TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda ______ 2025-12-04T09:59:13.8052325Z Traceback (most recent call last): 2025-12-04T09:59:13.8052819Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.8052922Z self._join_processes(fn) 2025-12-04T09:59:13.8053453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.8053581Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.8054121Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.8054229Z raise RuntimeError(error) 2025-12-04T09:59:13.8054606Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.8054725Z Traceback (most recent call last): 2025-12-04T09:59:13.8055228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8055333Z getattr(self, test_name)() 2025-12-04T09:59:13.8055839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8055954Z fn() 2025-12-04T09:59:13.8056552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8056661Z method(*args, **kwargs) 2025-12-04T09:59:13.8057339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8057454Z method(*args, **kwargs) 2025-12-04T09:59:13.8057957Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8058054Z with policy(): 2025-12-04T09:59:13.8058570Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8058712Z raise RuntimeError(msg) 2025-12-04T09:59:13.8059914Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 3. CUDA driver allocated memory was 604962816 and is now 678363136. 2025-12-04T09:59:13.8059930Z 2025-12-04T09:59:13.8060146Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8060802Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8060808Z 2025-12-04T09:59:13.8061078Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8061084Z 2025-12-04T09:59:13.8061088Z 2025-12-04T09:59:13.8061307Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.8061580Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.8062380Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8232c23afc6466e0.xml - 2025-12-04T09:59:13.8062552Z =========================== short test summary info ============================ 2025-12-04T09:59:13.8063422Z FAILED [13.1067s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.8063547Z Traceback (most recent call last): 2025-12-04T09:59:13.8064106Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8064217Z getattr(self, test_name)() 2025-12-04T09:59:13.8064755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8064853Z fn() 2025-12-04T09:59:13.8065360Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8065466Z method(*args, **kwargs) 2025-12-04T09:59:13.8065981Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8066082Z method(*args, **kwargs) 2025-12-04T09:59:13.8066594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8066688Z with policy(): 2025-12-04T09:59:13.8067198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8067312Z raise RuntimeError(msg) 2025-12-04T09:59:13.8068505Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 3. CUDA driver allocated memory was 604962816 and is now 678363136. 2025-12-04T09:59:13.8068545Z 2025-12-04T09:59:13.8068931Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8069566Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8069571Z 2025-12-04T09:59:13.8069827Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8070006Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.8070180Z ====================== 1 failed, 19 deselected in 13.32s ======================= 2025-12-04T09:59:13.8070281Z Got exit code 1 2025-12-04T09:59:13.8070380Z Retrying single test... 2025-12-04T09:59:13.8071009Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-983af60bcd722f1d.xml 2025-12-04T09:59:13.8071171Z ============================= test session starts ============================== 2025-12-04T09:59:13.8071511Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.8071615Z cachedir: .pytest_cache 2025-12-04T09:59:13.8072122Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.8072236Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.8072346Z configfile: pytest.ini 2025-12-04T09:59:13.8072864Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.8073074Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.8073797Z stepcurrent: skipping 19 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8073909Z Running 1 items in this shard 2025-12-04T09:59:13.8073914Z 2025-12-04T09:59:13.8074921Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda I1204 09:53:07.504000 73858 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 73910 2025-12-04T09:59:13.8075429Z I1204 09:53:07.505000 73858 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 73911 2025-12-04T09:59:13.8076116Z I1204 09:53:07.506000 73858 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 73912 2025-12-04T09:59:13.8076557Z I1204 09:53:07.506000 73858 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 73913 2025-12-04T09:59:13.8077665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8077792Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8078891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8079010Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8080100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8080208Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8081494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8081682Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8082608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8082724Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8084624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8084756Z _warn_cpu_init() 2025-12-04T09:59:13.8085667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8085785Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8087677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8087779Z _warn_cpu_init() 2025-12-04T09:59:13.8088709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8088809Z fsdp_model = FSDP( 2025-12-04T09:59:13.8089836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8089932Z fsdp_model = FSDP( 2025-12-04T09:59:13.8090845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8090954Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8091852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8091967Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8093909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8093995Z _warn_cpu_init() 2025-12-04T09:59:13.8095779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8095894Z _warn_cpu_init() 2025-12-04T09:59:13.8097082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8097185Z fsdp_model = FSDP( 2025-12-04T09:59:13.8098166Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8098273Z fsdp_model = FSDP( 2025-12-04T09:59:13.8099047Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8099194Z return func(*args, **kwargs) 2025-12-04T09:59:13.8099965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8100073Z return func(*args, **kwargs) 2025-12-04T09:59:13.8100839Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8100946Z return func(*args, **kwargs) 2025-12-04T09:59:13.8101712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8101815Z return func(*args, **kwargs) 2025-12-04T09:59:13.8102570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8102686Z return func(*args, **kwargs) 2025-12-04T09:59:13.8103441Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8103552Z return func(*args, **kwargs) 2025-12-04T09:59:13.8104329Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8104436Z return func(*args, **kwargs) 2025-12-04T09:59:13.8105193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8105299Z return func(*args, **kwargs) 2025-12-04T09:59:13.8106306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.8106416Z return func(*args, **kwargs) 2025-12-04T09:59:13.8110843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.8111219Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.8115207Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.8115582Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.8119606Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.8119961Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.8124643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.8125044Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.8125503Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8126053Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8127137Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8127682Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8128686Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8129080Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8130092Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8130585Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8131557Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8132040Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8132993Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8133457Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8134435Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8134919Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8136461Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 1. CUDA driver allocated memory was 607059968 and is now 678363136. 2025-12-04T09:59:13.8136986Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8137650Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8138782Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8139145Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8139867Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8140420Z [rank1]:E1204 09:53:17.868000 73911 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.8140873Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8141480Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8142490Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8142993Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8143994Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8144421Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8145395Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8145883Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8146843Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8147329Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8148295Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8148749Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8149760Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8150205Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8151665Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 0. CUDA driver allocated memory was 720306176 and is now 787415040. 2025-12-04T09:59:13.8151996Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8152582Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8153573Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8153892Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8154525Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8155019Z [rank0]:E1204 09:53:17.868000 73910 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.8155469Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8155945Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8156831Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8157274Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8158183Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8158537Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8159399Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8159825Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8160678Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8161111Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8161960Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8162393Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8163245Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8163687Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8165154Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 2. CUDA driver allocated memory was 604962816 and is now 678363136. 2025-12-04T09:59:13.8165486Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8166067Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8167047Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8167380Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8168009Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8168553Z [rank2]:E1204 09:53:17.869000 73912 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.8168952Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8169428Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8170309Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8170785Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8171667Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8172019Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8172876Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8173304Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8174164Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8174592Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8175474Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8175878Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8177014Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8177518Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8179176Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 3. CUDA driver allocated memory was 609157120 and is now 678363136. 2025-12-04T09:59:13.8179548Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8180202Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8181304Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8181672Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8182449Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8183003Z [rank3]:E1204 09:53:17.870000 73913 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.8183103Z dist init r=0, world=4 2025-12-04T09:59:13.8183199Z dist init r=2, world=4 2025-12-04T09:59:13.8183302Z dist init r=1, world=4 2025-12-04T09:59:13.8183396Z dist init r=3, world=4 2025-12-04T09:59:13.8184562Z [rank0]:[W1204 09:53:18.900793628 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.8184692Z FAILED [12.7750s] [100%] 2025-12-04T09:59:13.8184700Z 2025-12-04T09:59:13.8184850Z =================================== FAILURES =================================== 2025-12-04T09:59:13.8185160Z ______ TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda ______ 2025-12-04T09:59:13.8185277Z Traceback (most recent call last): 2025-12-04T09:59:13.8185829Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.8185938Z self._join_processes(fn) 2025-12-04T09:59:13.8186520Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.8186664Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.8187270Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.8187385Z raise RuntimeError(error) 2025-12-04T09:59:13.8187622Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.8187740Z Traceback (most recent call last): 2025-12-04T09:59:13.8188284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8188422Z getattr(self, test_name)() 2025-12-04T09:59:13.8189067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8189162Z fn() 2025-12-04T09:59:13.8189637Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8189732Z method(*args, **kwargs) 2025-12-04T09:59:13.8190215Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8190312Z method(*args, **kwargs) 2025-12-04T09:59:13.8190790Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8190882Z with policy(): 2025-12-04T09:59:13.8191359Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8191470Z raise RuntimeError(msg) 2025-12-04T09:59:13.8192593Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 2. CUDA driver allocated memory was 604962816 and is now 678363136. 2025-12-04T09:59:13.8192599Z 2025-12-04T09:59:13.8192812Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8193427Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8193432Z 2025-12-04T09:59:13.8193739Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8193751Z 2025-12-04T09:59:13.8193756Z 2025-12-04T09:59:13.8193963Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.8194317Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.8195045Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-983af60bcd722f1d.xml - 2025-12-04T09:59:13.8195194Z =========================== short test summary info ============================ 2025-12-04T09:59:13.8195929Z FAILED [12.7750s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.8196067Z Traceback (most recent call last): 2025-12-04T09:59:13.8196557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8196669Z getattr(self, test_name)() 2025-12-04T09:59:13.8197144Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8197222Z fn() 2025-12-04T09:59:13.8197677Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8197768Z method(*args, **kwargs) 2025-12-04T09:59:13.8198216Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8198306Z method(*args, **kwargs) 2025-12-04T09:59:13.8198755Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8198847Z with policy(): 2025-12-04T09:59:13.8199309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8199404Z raise RuntimeError(msg) 2025-12-04T09:59:13.8200497Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 2. CUDA driver allocated memory was 604962816 and is now 678363136. 2025-12-04T09:59:13.8200504Z 2025-12-04T09:59:13.8200693Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8201271Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8201278Z 2025-12-04T09:59:13.8201516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8201671Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.8201837Z ====================== 1 failed, 26 deselected in 12.99s ======================= 2025-12-04T09:59:13.8201920Z Got exit code 1 2025-12-04T09:59:13.8202011Z Retrying single test... 2025-12-04T09:59:13.8202565Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-84ede3fbd174dfda.xml 2025-12-04T09:59:13.8202707Z ============================= test session starts ============================== 2025-12-04T09:59:13.8203013Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.8203115Z cachedir: .pytest_cache 2025-12-04T09:59:13.8203576Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.8203690Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.8203781Z configfile: pytest.ini 2025-12-04T09:59:13.8204255Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.8204508Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.8205162Z stepcurrent: skipping 19 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8205276Z Running 1 items in this shard 2025-12-04T09:59:13.8205281Z 2025-12-04T09:59:13.8206184Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda I1204 09:53:25.024000 74195 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 74247 2025-12-04T09:59:13.8206625Z I1204 09:53:25.025000 74195 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 74248 2025-12-04T09:59:13.8207102Z I1204 09:53:25.026000 74195 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 74249 2025-12-04T09:59:13.8207540Z I1204 09:53:25.026000 74195 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 74250 2025-12-04T09:59:13.8208657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8208768Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8209857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8209976Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8211074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8211190Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8212306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8212421Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8213276Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8213376Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8215174Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8215260Z _warn_cpu_init() 2025-12-04T09:59:13.8216125Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8216226Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8218509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8218640Z _warn_cpu_init() 2025-12-04T09:59:13.8219634Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8219744Z fsdp_model = FSDP( 2025-12-04T09:59:13.8220710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8221081Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8223099Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8223204Z _warn_cpu_init() 2025-12-04T09:59:13.8224194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8224292Z fsdp_model = FSDP( 2025-12-04T09:59:13.8225259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8225372Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8227451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8227551Z _warn_cpu_init() 2025-12-04T09:59:13.8228542Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8228645Z fsdp_model = FSDP( 2025-12-04T09:59:13.8229625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8229731Z fsdp_model = FSDP( 2025-12-04T09:59:13.8230497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8230609Z return func(*args, **kwargs) 2025-12-04T09:59:13.8231375Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8231480Z return func(*args, **kwargs) 2025-12-04T09:59:13.8232242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8232351Z return func(*args, **kwargs) 2025-12-04T09:59:13.8233225Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8233360Z return func(*args, **kwargs) 2025-12-04T09:59:13.8234034Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8234134Z return func(*args, **kwargs) 2025-12-04T09:59:13.8234800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8234893Z return func(*args, **kwargs) 2025-12-04T09:59:13.8235602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8235694Z return func(*args, **kwargs) 2025-12-04T09:59:13.8236371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8236463Z return func(*args, **kwargs) 2025-12-04T09:59:13.8237344Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.8237443Z return func(*args, **kwargs) 2025-12-04T09:59:13.8241443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.8241802Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.8245786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.8246134Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.8250156Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.8250547Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.8254525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py:865: UserWarning: The AccumulateGrad node's stream does not match the stream of the node that produced the incoming gradient. This may incur unnecessary synchronization and break CUDA graph capture if the AccumulateGrad node's stream is the default stream. This mismatch is caused by an AccumulateGrad node created prior to the current iteration being kept alive. This can happen if the autograd graph is still being kept alive by tensors such as the loss, or if you are using DDP, which will stash a reference to the node. To resolve the mismatch, delete all references to the autograd graph or ensure that DDP initialization is performed under the same stream as subsequent forwards. If the mismatch is intentional, you can use torch.autograd.graph.set_warn_on_accumulate_grad_stream_mismatch(False) to suppress this warning. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/input_buffer.cpp:240.) 2025-12-04T09:59:13.8254879Z return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 2025-12-04T09:59:13.8255281Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8255782Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8256910Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8257431Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8258430Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8258827Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8259795Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8260281Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8261236Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8261733Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8262751Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8263207Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8264168Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8264664Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8266345Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 0. CUDA driver allocated memory was 720306176 and is now 787415040. 2025-12-04T09:59:13.8266721Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8267373Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8268478Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8268959Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8269721Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8270211Z [rank0]:E1204 09:53:35.997000 74247 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.8270637Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8271107Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8271998Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8272448Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8273329Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8273681Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8274533Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8274964Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8275812Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8276310Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8277167Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8277565Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8278421Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8278891Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8280360Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 1. CUDA driver allocated memory was 609157120 and is now 678363136. 2025-12-04T09:59:13.8280690Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8281273Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8282252Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8282580Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8283237Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8283725Z [rank1]:E1204 09:53:35.998000 74248 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.8284122Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8284593Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8285485Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8285937Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8286825Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8287177Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8288033Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8288461Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8289367Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8289807Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8290657Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8291061Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8291945Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8292383Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8293845Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 3. CUDA driver allocated memory was 604962816 and is now 678363136. 2025-12-04T09:59:13.8294172Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8294758Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8295735Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8296085Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8296963Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8297518Z [rank3]:E1204 09:53:35.999000 74250 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.8297968Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8298494Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8299499Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8300007Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8301004Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8301396Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8302370Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8302931Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8303888Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8304372Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8305323Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8305860Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8306819Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8307317Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8309071Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 2. CUDA driver allocated memory was 607059968 and is now 678363136. 2025-12-04T09:59:13.8309397Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8309989Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8310994Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8311320Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8311955Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8312445Z [rank2]:E1204 09:53:36.001000 74249 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.8312532Z dist init r=1, world=4 2025-12-04T09:59:13.8312620Z dist init r=3, world=4 2025-12-04T09:59:13.8312711Z dist init r=0, world=4 2025-12-04T09:59:13.8312795Z dist init r=2, world=4 2025-12-04T09:59:13.8313822Z [rank0]:[W1204 09:53:36.033367191 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.8313917Z FAILED [13.2387s] [100%] 2025-12-04T09:59:13.8313922Z 2025-12-04T09:59:13.8314051Z =================================== FAILURES =================================== 2025-12-04T09:59:13.8314321Z ______ TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda ______ 2025-12-04T09:59:13.8314427Z Traceback (most recent call last): 2025-12-04T09:59:13.8314909Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.8315010Z self._join_processes(fn) 2025-12-04T09:59:13.8315581Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.8315713Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.8316251Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.8316351Z raise RuntimeError(error) 2025-12-04T09:59:13.8316562Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.8316670Z Traceback (most recent call last): 2025-12-04T09:59:13.8317151Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8317281Z getattr(self, test_name)() 2025-12-04T09:59:13.8317752Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8317837Z fn() 2025-12-04T09:59:13.8318289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8318382Z method(*args, **kwargs) 2025-12-04T09:59:13.8318834Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8318925Z method(*args, **kwargs) 2025-12-04T09:59:13.8319373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8319463Z with policy(): 2025-12-04T09:59:13.8319913Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8320017Z raise RuntimeError(msg) 2025-12-04T09:59:13.8321400Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 3. CUDA driver allocated memory was 604962816 and is now 678363136. 2025-12-04T09:59:13.8321411Z 2025-12-04T09:59:13.8321630Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8322363Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8322370Z 2025-12-04T09:59:13.8322634Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8322640Z 2025-12-04T09:59:13.8322645Z 2025-12-04T09:59:13.8322868Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.8323127Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.8323934Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-84ede3fbd174dfda.xml - 2025-12-04T09:59:13.8324105Z =========================== short test summary info ============================ 2025-12-04T09:59:13.8324932Z FAILED [13.2387s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.8325065Z Traceback (most recent call last): 2025-12-04T09:59:13.8325615Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8325730Z getattr(self, test_name)() 2025-12-04T09:59:13.8326267Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8326359Z fn() 2025-12-04T09:59:13.8326866Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8327027Z method(*args, **kwargs) 2025-12-04T09:59:13.8327569Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8327680Z method(*args, **kwargs) 2025-12-04T09:59:13.8328185Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8328289Z with policy(): 2025-12-04T09:59:13.8328799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8328909Z raise RuntimeError(msg) 2025-12-04T09:59:13.8330112Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1708544 on device 3. CUDA driver allocated memory was 604962816 and is now 678363136. 2025-12-04T09:59:13.8330156Z 2025-12-04T09:59:13.8330373Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8331045Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8331050Z 2025-12-04T09:59:13.8331314Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8331496Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.8331678Z ====================== 1 failed, 26 deselected in 13.46s ======================= 2025-12-04T09:59:13.8331775Z Got exit code 1 2025-12-04T09:59:13.8332355Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda 2025-12-04T09:59:13.8332760Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.8333380Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9538bfd24f807d16.xml 2025-12-04T09:59:13.8333659Z ============================= test session starts ============================== 2025-12-04T09:59:13.8333971Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.8334092Z cachedir: .pytest_cache 2025-12-04T09:59:13.8334553Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.8334663Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.8334763Z configfile: pytest.ini 2025-12-04T09:59:13.8335237Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.8335427Z collecting ... collected 60 items / 20 deselected / 40 selected 2025-12-04T09:59:13.8335554Z stepcurrent: skipping 20 already run items. 2025-12-04T09:59:13.8335652Z Running 7 items in this shard 2025-12-04T09:59:13.8335659Z 2025-12-04T09:59:13.8336665Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda I1204 09:53:42.954000 74532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 74584 2025-12-04T09:59:13.8337333Z I1204 09:53:42.955000 74532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 74585 2025-12-04T09:59:13.8337830Z I1204 09:53:42.955000 74532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 74586 2025-12-04T09:59:13.8338319Z I1204 09:53:42.956000 74532 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 74587 2025-12-04T09:59:13.8339576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8339774Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8341015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8341139Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8342364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8342514Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8343740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8343864Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8345894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8345989Z _warn_cpu_init() 2025-12-04T09:59:13.8348001Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8348098Z _warn_cpu_init() 2025-12-04T09:59:13.8350102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8350195Z _warn_cpu_init() 2025-12-04T09:59:13.8351965Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8352060Z _warn_cpu_init() 2025-12-04T09:59:13.8352945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.8353050Z return func(*args, **kwargs) 2025-12-04T09:59:13.8353455Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8353928Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8354882Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8355334Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8356221Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8356571Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8357455Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8357893Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8358740Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8359172Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8360018Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8360418Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8361273Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8361735Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8363208Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T09:59:13.8363538Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8364119Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8365131Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8365456Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8366086Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8366575Z [rank0]:E1204 09:53:53.720000 74584 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.8366971Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8367715Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8368612Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8369059Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8369944Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8370323Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8371180Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8371619Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8372470Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8372911Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8373765Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8374176Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8375070Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8375507Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8377258Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.8377636Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8378308Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8379438Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8379804Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8380520Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8381108Z [rank2]:E1204 09:53:53.721000 74586 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.8381584Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8382113Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8383120Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8383627Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8384647Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8385042Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8386010Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8386494Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8387452Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8387940Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8389006Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8389560Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8390417Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8390858Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8392338Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T09:59:13.8392660Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8393249Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8394245Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8394574Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8395233Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8395743Z [rank1]:E1204 09:53:53.722000 74585 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.8396142Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8396610Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8397503Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8397976Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8398865Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8399215Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8400066Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8400496Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8401343Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8401780Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8402648Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8403046Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8403899Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8404337Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8405815Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 3. CUDA driver allocated memory was 586088448 and is now 651100160. 2025-12-04T09:59:13.8406136Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8406726Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8407725Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8408108Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8408741Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8409227Z [rank3]:E1204 09:53:53.722000 74587 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.8409313Z dist init r=2, world=4 2025-12-04T09:59:13.8409397Z dist init r=0, world=4 2025-12-04T09:59:13.8409487Z dist init r=1, world=4 2025-12-04T09:59:13.8409568Z dist init r=3, world=4 2025-12-04T09:59:13.8410619Z [rank0]:[W1204 09:53:54.743027594 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.8410715Z FAILED [12.7101s] [ 14%] 2025-12-04T09:59:13.8410721Z 2025-12-04T09:59:13.8410852Z =================================== FAILURES =================================== 2025-12-04T09:59:13.8411132Z ___ TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda ____ 2025-12-04T09:59:13.8411237Z Traceback (most recent call last): 2025-12-04T09:59:13.8411722Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.8411822Z self._join_processes(fn) 2025-12-04T09:59:13.8412336Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.8412467Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.8413005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.8413104Z raise RuntimeError(error) 2025-12-04T09:59:13.8413315Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.8413422Z Traceback (most recent call last): 2025-12-04T09:59:13.8413929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8414034Z getattr(self, test_name)() 2025-12-04T09:59:13.8414508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8414593Z fn() 2025-12-04T09:59:13.8415038Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8415132Z method(*args, **kwargs) 2025-12-04T09:59:13.8415584Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8415673Z method(*args, **kwargs) 2025-12-04T09:59:13.8416117Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8416207Z with policy(): 2025-12-04T09:59:13.8416907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8417025Z raise RuntimeError(msg) 2025-12-04T09:59:13.8418241Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T09:59:13.8418249Z 2025-12-04T09:59:13.8418464Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8419158Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8419201Z 2025-12-04T09:59:13.8419493Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8419499Z 2025-12-04T09:59:13.8419669Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.8419790Z Traceback (most recent call last): 2025-12-04T09:59:13.8420339Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8420453Z getattr(self, test_name)() 2025-12-04T09:59:13.8421205Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8421304Z fn() 2025-12-04T09:59:13.8421892Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8421996Z method(*args, **kwargs) 2025-12-04T09:59:13.8422508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8422612Z method(*args, **kwargs) 2025-12-04T09:59:13.8423120Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8423224Z with policy(): 2025-12-04T09:59:13.8423735Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8423852Z raise RuntimeError(msg) 2025-12-04T09:59:13.8425063Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.8425071Z 2025-12-04T09:59:13.8425290Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8425969Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8425974Z 2025-12-04T09:59:13.8426232Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8426280Z 2025-12-04T09:59:13.8426285Z 2025-12-04T09:59:13.8426509Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.8426766Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.8427578Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9538bfd24f807d16.xml - 2025-12-04T09:59:13.8427747Z =========================== short test summary info ============================ 2025-12-04T09:59:13.8428603Z FAILED [12.7101s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.8428732Z Traceback (most recent call last): 2025-12-04T09:59:13.8429277Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8429398Z getattr(self, test_name)() 2025-12-04T09:59:13.8429934Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8430019Z fn() 2025-12-04T09:59:13.8430530Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8430629Z method(*args, **kwargs) 2025-12-04T09:59:13.8431130Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8431239Z method(*args, **kwargs) 2025-12-04T09:59:13.8431815Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8431922Z with policy(): 2025-12-04T09:59:13.8432428Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8432536Z raise RuntimeError(msg) 2025-12-04T09:59:13.8433808Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T09:59:13.8433815Z 2025-12-04T09:59:13.8434014Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8434695Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8434703Z 2025-12-04T09:59:13.8434947Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8434952Z 2025-12-04T09:59:13.8435100Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.8435215Z Traceback (most recent call last): 2025-12-04T09:59:13.8435730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8435843Z getattr(self, test_name)() 2025-12-04T09:59:13.8436343Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8436424Z fn() 2025-12-04T09:59:13.8436901Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8436998Z method(*args, **kwargs) 2025-12-04T09:59:13.8437472Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8437572Z method(*args, **kwargs) 2025-12-04T09:59:13.8438041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8438133Z with policy(): 2025-12-04T09:59:13.8438641Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8438741Z raise RuntimeError(msg) 2025-12-04T09:59:13.8440059Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.8440069Z 2025-12-04T09:59:13.8440273Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8440946Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8440953Z 2025-12-04T09:59:13.8441205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8441375Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.8441552Z ====================== 1 failed, 20 deselected in 12.93s ======================= 2025-12-04T09:59:13.8441644Z Got exit code 1 2025-12-04T09:59:13.8441753Z Retrying single test... 2025-12-04T09:59:13.8442353Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e7d2c56cd2be4bb.xml 2025-12-04T09:59:13.8442512Z ============================= test session starts ============================== 2025-12-04T09:59:13.8442852Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.8442954Z cachedir: .pytest_cache 2025-12-04T09:59:13.8443525Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.8443642Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.8443741Z configfile: pytest.ini 2025-12-04T09:59:13.8444267Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.8444472Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.8445207Z stepcurrent: skipping 20 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8445348Z Running 1 items in this shard 2025-12-04T09:59:13.8445353Z 2025-12-04T09:59:13.8446360Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda I1204 09:54:00.474000 74869 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 74921 2025-12-04T09:59:13.8446858Z I1204 09:54:00.475000 74869 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 74922 2025-12-04T09:59:13.8447338Z I1204 09:54:00.476000 74869 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 74923 2025-12-04T09:59:13.8447812Z I1204 09:54:00.476000 74869 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 74924 2025-12-04T09:59:13.8449029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8449153Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8450362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8450485Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8451708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8451825Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8453015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8453144Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8455111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8455216Z _warn_cpu_init() 2025-12-04T09:59:13.8457438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8457600Z _warn_cpu_init() 2025-12-04T09:59:13.8459647Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8459752Z _warn_cpu_init() 2025-12-04T09:59:13.8461765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8461902Z _warn_cpu_init() 2025-12-04T09:59:13.8462897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.8463009Z return func(*args, **kwargs) 2025-12-04T09:59:13.8463480Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8464013Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8465033Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8465543Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8466563Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8466968Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8467934Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8468624Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8469555Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8470033Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8470954Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8471383Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8472323Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8472865Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8474488Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064. 2025-12-04T09:59:13.8474836Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8475482Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8476604Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8476952Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8477654Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8478179Z [rank0]:E1204 09:54:11.279000 74921 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.8478626Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8479137Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8480116Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8480638Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8481593Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8481981Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8483093Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8483537Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8484381Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8484817Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8485662Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8486055Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8486964Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8487399Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8488889Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.8489238Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8489830Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8490836Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8491156Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8491793Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8492276Z [rank2]:E1204 09:54:11.280000 74923 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.8492685Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8493156Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8494072Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8494518Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8495398Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8495755Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8496836Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8497339Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8498292Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8498782Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8499746Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8500256Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8501224Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8501713Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8503380Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.8503777Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8504438Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8505576Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8505935Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8506661Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8507202Z [rank1]:E1204 09:54:11.280000 74922 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.8507658Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8508209Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8509398Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8509849Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8510723Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8511082Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8511931Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8512368Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8513216Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8513650Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8514622Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8515023Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8515881Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8516318Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8517832Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 3. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T09:59:13.8518158Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8518747Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8519747Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8520065Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8520706Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8521517Z [rank3]:E1204 09:54:11.280000 74924 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.8521692Z dist init r=0, world=4 2025-12-04T09:59:13.8521792Z dist init r=1, world=4 2025-12-04T09:59:13.8521885Z dist init r=2, world=4 2025-12-04T09:59:13.8522087Z dist init r=3, world=4 2025-12-04T09:59:13.8523244Z [rank0]:[W1204 09:54:11.302714537 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.8523357Z FAILED [12.8737s] [100%] 2025-12-04T09:59:13.8523363Z 2025-12-04T09:59:13.8523513Z =================================== FAILURES =================================== 2025-12-04T09:59:13.8523827Z ___ TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda ____ 2025-12-04T09:59:13.8523953Z Traceback (most recent call last): 2025-12-04T09:59:13.8524503Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.8524613Z self._join_processes(fn) 2025-12-04T09:59:13.8525202Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.8525340Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.8525953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.8526068Z raise RuntimeError(error) 2025-12-04T09:59:13.8526299Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.8526423Z Traceback (most recent call last): 2025-12-04T09:59:13.8527060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8527177Z getattr(self, test_name)() 2025-12-04T09:59:13.8527710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8527795Z fn() 2025-12-04T09:59:13.8528310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8528414Z method(*args, **kwargs) 2025-12-04T09:59:13.8528912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8529057Z method(*args, **kwargs) 2025-12-04T09:59:13.8529567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8529665Z with policy(): 2025-12-04T09:59:13.8530177Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8530285Z raise RuntimeError(msg) 2025-12-04T09:59:13.8531508Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.8531515Z 2025-12-04T09:59:13.8531729Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8532423Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8532431Z 2025-12-04T09:59:13.8532693Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8532699Z 2025-12-04T09:59:13.8532861Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.8532989Z Traceback (most recent call last): 2025-12-04T09:59:13.8533740Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8533875Z getattr(self, test_name)() 2025-12-04T09:59:13.8534350Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8534426Z fn() 2025-12-04T09:59:13.8534880Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8534971Z method(*args, **kwargs) 2025-12-04T09:59:13.8535414Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8535511Z method(*args, **kwargs) 2025-12-04T09:59:13.8535953Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8536045Z with policy(): 2025-12-04T09:59:13.8536560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8536660Z raise RuntimeError(msg) 2025-12-04T09:59:13.8538050Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.8538057Z 2025-12-04T09:59:13.8538268Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8538962Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8538968Z 2025-12-04T09:59:13.8539268Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8539302Z 2025-12-04T09:59:13.8539307Z 2025-12-04T09:59:13.8539534Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.8539795Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.8540599Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e7d2c56cd2be4bb.xml - 2025-12-04T09:59:13.8540776Z =========================== short test summary info ============================ 2025-12-04T09:59:13.8541627Z FAILED [12.8737s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.8541780Z Traceback (most recent call last): 2025-12-04T09:59:13.8542325Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8542440Z getattr(self, test_name)() 2025-12-04T09:59:13.8542976Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8543063Z fn() 2025-12-04T09:59:13.8543567Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8543674Z method(*args, **kwargs) 2025-12-04T09:59:13.8544174Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8544283Z method(*args, **kwargs) 2025-12-04T09:59:13.8544787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8544880Z with policy(): 2025-12-04T09:59:13.8545393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8545502Z raise RuntimeError(msg) 2025-12-04T09:59:13.8546750Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.8546763Z 2025-12-04T09:59:13.8546972Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8547653Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8547660Z 2025-12-04T09:59:13.8547929Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8547934Z 2025-12-04T09:59:13.8548094Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.8548221Z Traceback (most recent call last): 2025-12-04T09:59:13.8548879Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8548987Z getattr(self, test_name)() 2025-12-04T09:59:13.8549591Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8549671Z fn() 2025-12-04T09:59:13.8550285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8550392Z method(*args, **kwargs) 2025-12-04T09:59:13.8550863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8550966Z method(*args, **kwargs) 2025-12-04T09:59:13.8551435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8551561Z with policy(): 2025-12-04T09:59:13.8552073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8552172Z raise RuntimeError(msg) 2025-12-04T09:59:13.8553318Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.8553324Z 2025-12-04T09:59:13.8553522Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8554160Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8554202Z 2025-12-04T09:59:13.8554455Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8554625Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.8554799Z ====================== 1 failed, 26 deselected in 13.09s ======================= 2025-12-04T09:59:13.8554888Z Got exit code 1 2025-12-04T09:59:13.8554988Z Retrying single test... 2025-12-04T09:59:13.8555581Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1378f62336ac1630.xml 2025-12-04T09:59:13.8555735Z ============================= test session starts ============================== 2025-12-04T09:59:13.8556061Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.8556165Z cachedir: .pytest_cache 2025-12-04T09:59:13.8556649Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.8556766Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.8556867Z configfile: pytest.ini 2025-12-04T09:59:13.8557373Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.8557582Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.8558321Z stepcurrent: skipping 20 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8558432Z Running 1 items in this shard 2025-12-04T09:59:13.8558437Z 2025-12-04T09:59:13.8559408Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda I1204 09:54:18.024000 75206 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 75258 2025-12-04T09:59:13.8559876Z I1204 09:54:18.025000 75206 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 75259 2025-12-04T09:59:13.8560348Z I1204 09:54:18.026000 75206 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 75260 2025-12-04T09:59:13.8560805Z I1204 09:54:18.027000 75206 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 75261 2025-12-04T09:59:13.8561992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8562113Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8563274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8563428Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8564677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8564792Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8565874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8565988Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8567814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8567903Z _warn_cpu_init() 2025-12-04T09:59:13.8569685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8569774Z _warn_cpu_init() 2025-12-04T09:59:13.8571605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8571692Z _warn_cpu_init() 2025-12-04T09:59:13.8573486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8573572Z _warn_cpu_init() 2025-12-04T09:59:13.8574458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.8574557Z return func(*args, **kwargs) 2025-12-04T09:59:13.8574963Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8575444Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8576408Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8577068Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8578093Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8578522Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8579484Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8579969Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8580962Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8581453Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8582412Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8582852Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8583815Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8584303Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8585990Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T09:59:13.8586360Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8587017Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8588164Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8588526Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8589321Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8589810Z [rank0]:E1204 09:54:28.805000 75258 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.8590208Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8590684Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8591567Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8592081Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8592962Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8593317Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8594168Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8594623Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8595482Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8595915Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8596772Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8597163Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8598020Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8598455Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8599948Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T09:59:13.8600278Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8600861Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8601872Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8602193Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8602837Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8603318Z [rank2]:E1204 09:54:28.805000 75260 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.8603719Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8604192Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8605103Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8605584Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8606464Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8606821Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8607696Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8608134Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8608990Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8609418Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8610277Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8610676Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8611542Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8612006Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8613489Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 1. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T09:59:13.8613821Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8614404Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8615421Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8615744Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8616455Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8617146Z [rank1]:E1204 09:54:28.805000 75259 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.8617598Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8618203Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8619205Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8619723Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8620712Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8621356Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8622327Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8622813Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8623783Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8624270Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8625238Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8625690Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8626718Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8627207Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8628871Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 3. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:13.8629247Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8629903Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8631046Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8631408Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8632132Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8632676Z [rank3]:E1204 09:54:28.806000 75261 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.8632960Z dist init r=1, world=4 2025-12-04T09:59:13.8633056Z dist init r=0, world=4 2025-12-04T09:59:13.8633139Z dist init r=3, world=4 2025-12-04T09:59:13.8633221Z dist init r=2, world=4 2025-12-04T09:59:13.8634253Z [rank0]:[W1204 09:54:29.826022596 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.8634345Z FAILED [12.4666s] [100%] 2025-12-04T09:59:13.8634350Z 2025-12-04T09:59:13.8634483Z =================================== FAILURES =================================== 2025-12-04T09:59:13.8634796Z ___ TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda ____ 2025-12-04T09:59:13.8634901Z Traceback (most recent call last): 2025-12-04T09:59:13.8635395Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.8635497Z self._join_processes(fn) 2025-12-04T09:59:13.8636018Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.8636146Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.8636685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.8636787Z raise RuntimeError(error) 2025-12-04T09:59:13.8636993Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.8637096Z Traceback (most recent call last): 2025-12-04T09:59:13.8637585Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8637682Z getattr(self, test_name)() 2025-12-04T09:59:13.8638167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8638245Z fn() 2025-12-04T09:59:13.8638694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8638830Z method(*args, **kwargs) 2025-12-04T09:59:13.8639279Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8639368Z method(*args, **kwargs) 2025-12-04T09:59:13.8639820Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8639907Z with policy(): 2025-12-04T09:59:13.8640365Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8640459Z raise RuntimeError(msg) 2025-12-04T09:59:13.8641539Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T09:59:13.8641553Z 2025-12-04T09:59:13.8641743Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8642346Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8642352Z 2025-12-04T09:59:13.8642590Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8642596Z 2025-12-04T09:59:13.8642601Z 2025-12-04T09:59:13.8642796Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.8643033Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.8643795Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1378f62336ac1630.xml - 2025-12-04T09:59:13.8643948Z =========================== short test summary info ============================ 2025-12-04T09:59:13.8644711Z FAILED [12.4666s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.8644816Z Traceback (most recent call last): 2025-12-04T09:59:13.8645309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8645434Z getattr(self, test_name)() 2025-12-04T09:59:13.8645908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8645992Z fn() 2025-12-04T09:59:13.8646442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8646532Z method(*args, **kwargs) 2025-12-04T09:59:13.8646989Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8647078Z method(*args, **kwargs) 2025-12-04T09:59:13.8647532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8647618Z with policy(): 2025-12-04T09:59:13.8648065Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8648166Z raise RuntimeError(msg) 2025-12-04T09:59:13.8649251Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda! Caching allocator allocated memory was 512 and is now reported as 227840 on device 2. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T09:59:13.8649258Z 2025-12-04T09:59:13.8649454Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8650083Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8650089Z 2025-12-04T09:59:13.8650324Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8650488Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.8650645Z ====================== 1 failed, 26 deselected in 12.69s ======================= 2025-12-04T09:59:13.8650739Z Got exit code 1 2025-12-04T09:59:13.8651269Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda 2025-12-04T09:59:13.8651628Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.8652188Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8e092965a6aa7362.xml 2025-12-04T09:59:13.8652331Z ============================= test session starts ============================== 2025-12-04T09:59:13.8652646Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.8652740Z cachedir: .pytest_cache 2025-12-04T09:59:13.8653196Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.8653313Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.8653406Z configfile: pytest.ini 2025-12-04T09:59:13.8653879Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.8654101Z collecting ... collected 60 items / 21 deselected / 39 selected 2025-12-04T09:59:13.8654250Z stepcurrent: skipping 21 already run items. 2025-12-04T09:59:13.8654355Z Running 6 items in this shard 2025-12-04T09:59:13.8654359Z 2025-12-04T09:59:13.8655265Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda I1204 09:54:35.503000 75543 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 75595 2025-12-04T09:59:13.8655704Z I1204 09:54:35.504000 75543 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 75596 2025-12-04T09:59:13.8656145Z I1204 09:54:35.505000 75543 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 75597 2025-12-04T09:59:13.8656851Z I1204 09:54:35.506000 75543 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 75598 2025-12-04T09:59:13.8658117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8658246Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8659483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8659614Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8660850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8660983Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8662219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8662386Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8663355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8663468Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8664436Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8664551Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8666578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8666677Z _warn_cpu_init() 2025-12-04T09:59:13.8668696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8668825Z _warn_cpu_init() 2025-12-04T09:59:13.8669846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8669954Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8670812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8670915Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8672692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8672813Z _warn_cpu_init() 2025-12-04T09:59:13.8674587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8674674Z _warn_cpu_init() 2025-12-04T09:59:13.8675560Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8675652Z fsdp_model = FSDP( 2025-12-04T09:59:13.8676532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8676646Z fsdp_model = FSDP( 2025-12-04T09:59:13.8677521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8677615Z fsdp_model = FSDP( 2025-12-04T09:59:13.8678486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8678581Z fsdp_model = FSDP( 2025-12-04T09:59:13.8679467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.8679564Z return func(*args, **kwargs) 2025-12-04T09:59:13.8680249Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8680347Z return func(*args, **kwargs) 2025-12-04T09:59:13.8681035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8681131Z return func(*args, **kwargs) 2025-12-04T09:59:13.8681807Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8681909Z return func(*args, **kwargs) 2025-12-04T09:59:13.8682640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8682746Z return func(*args, **kwargs) 2025-12-04T09:59:13.8683417Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8683510Z return func(*args, **kwargs) 2025-12-04T09:59:13.8684187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8684307Z return func(*args, **kwargs) 2025-12-04T09:59:13.8684989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8685084Z return func(*args, **kwargs) 2025-12-04T09:59:13.8685755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8685857Z return func(*args, **kwargs) 2025-12-04T09:59:13.8686267Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8686746Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8687631Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8688083Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8688963Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8689340Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8690199Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8690631Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8691493Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8691925Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8692777Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8693182Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8694031Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8694476Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8695983Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 0. CUDA driver allocated memory was 714014720 and is now 762249216. 2025-12-04T09:59:13.8696396Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8697196Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8698341Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8698718Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8699434Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8699985Z [rank0]:E1204 09:54:46.995000 75595 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.8700435Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8700973Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8701975Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8702486Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8703502Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8703903Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8704871Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8705360Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8706329Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8706817Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8707779Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8708230Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8709546Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8710019Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8711475Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 1. CUDA driver allocated memory was 604962816 and is now 653197312. 2025-12-04T09:59:13.8711809Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8712420Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8713398Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8713728Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8714361Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8714850Z [rank1]:E1204 09:54:46.996000 75596 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.8715252Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8715736Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8716619Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8717089Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8717970Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8718317Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8719180Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8719613Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8720470Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8721046Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8722162Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8722619Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8723788Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8724287Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8725942Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 3. CUDA driver allocated memory was 607059968 and is now 653197312. 2025-12-04T09:59:13.8726350Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8727010Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8728112Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8728484Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8729197Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8729747Z [rank3]:E1204 09:54:46.996000 75598 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.8730197Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8730740Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8731774Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8732285Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8733391Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8733889Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8734750Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8735182Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8736037Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8736534Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8737680Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8738161Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8739121Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8739611Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8741246Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 2. CUDA driver allocated memory was 602865664 and is now 653197312. 2025-12-04T09:59:13.8741646Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8742303Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8743399Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8743765Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8744480Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8745032Z [rank2]:E1204 09:54:46.997000 75597 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.8745130Z dist init r=1, world=4 2025-12-04T09:59:13.8745225Z dist init r=3, world=4 2025-12-04T09:59:13.8745322Z dist init r=2, world=4 2025-12-04T09:59:13.8745446Z dist init r=0, world=4 2025-12-04T09:59:13.8746609Z [rank0]:[W1204 09:54:47.043765415 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.8746710Z FAILED [13.9911s] [ 16%] 2025-12-04T09:59:13.8746716Z 2025-12-04T09:59:13.8746864Z =================================== FAILURES =================================== 2025-12-04T09:59:13.8747170Z ______ TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda _______ 2025-12-04T09:59:13.8747289Z Traceback (most recent call last): 2025-12-04T09:59:13.8747838Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.8747952Z self._join_processes(fn) 2025-12-04T09:59:13.8748534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.8748678Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.8749351Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.8749451Z raise RuntimeError(error) 2025-12-04T09:59:13.8749669Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.8749776Z Traceback (most recent call last): 2025-12-04T09:59:13.8750263Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8750389Z getattr(self, test_name)() 2025-12-04T09:59:13.8750890Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8750978Z fn() 2025-12-04T09:59:13.8751432Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8751525Z method(*args, **kwargs) 2025-12-04T09:59:13.8751980Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8752071Z method(*args, **kwargs) 2025-12-04T09:59:13.8752532Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8752644Z with policy(): 2025-12-04T09:59:13.8753094Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8753195Z raise RuntimeError(msg) 2025-12-04T09:59:13.8754254Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 1. CUDA driver allocated memory was 604962816 and is now 653197312. 2025-12-04T09:59:13.8754260Z 2025-12-04T09:59:13.8754458Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8755031Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8755037Z 2025-12-04T09:59:13.8755270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8755277Z 2025-12-04T09:59:13.8755426Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.8755532Z Traceback (most recent call last): 2025-12-04T09:59:13.8756023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8756121Z getattr(self, test_name)() 2025-12-04T09:59:13.8756598Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8756705Z fn() 2025-12-04T09:59:13.8757153Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8757242Z method(*args, **kwargs) 2025-12-04T09:59:13.8757695Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8757787Z method(*args, **kwargs) 2025-12-04T09:59:13.8758235Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8758320Z with policy(): 2025-12-04T09:59:13.8758772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8758875Z raise RuntimeError(msg) 2025-12-04T09:59:13.8759926Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 3. CUDA driver allocated memory was 607059968 and is now 653197312. 2025-12-04T09:59:13.8759932Z 2025-12-04T09:59:13.8760129Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8760703Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8760710Z 2025-12-04T09:59:13.8760943Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8760948Z 2025-12-04T09:59:13.8760961Z 2025-12-04T09:59:13.8761153Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.8761435Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.8762155Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8e092965a6aa7362.xml - 2025-12-04T09:59:13.8762305Z =========================== short test summary info ============================ 2025-12-04T09:59:13.8763031Z FAILED [13.9911s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.8763143Z Traceback (most recent call last): 2025-12-04T09:59:13.8763655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8763765Z getattr(self, test_name)() 2025-12-04T09:59:13.8764240Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8764321Z fn() 2025-12-04T09:59:13.8764773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8764867Z method(*args, **kwargs) 2025-12-04T09:59:13.8765322Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8765411Z method(*args, **kwargs) 2025-12-04T09:59:13.8765857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8765954Z with policy(): 2025-12-04T09:59:13.8766409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8766502Z raise RuntimeError(msg) 2025-12-04T09:59:13.8767563Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 1. CUDA driver allocated memory was 604962816 and is now 653197312. 2025-12-04T09:59:13.8767568Z 2025-12-04T09:59:13.8767800Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8768389Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8768393Z 2025-12-04T09:59:13.8768628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8768633Z 2025-12-04T09:59:13.8768783Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.8768885Z Traceback (most recent call last): 2025-12-04T09:59:13.8769373Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8769478Z getattr(self, test_name)() 2025-12-04T09:59:13.8769951Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8770028Z fn() 2025-12-04T09:59:13.8770480Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8770573Z method(*args, **kwargs) 2025-12-04T09:59:13.8771030Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8771120Z method(*args, **kwargs) 2025-12-04T09:59:13.8771563Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8771655Z with policy(): 2025-12-04T09:59:13.8772101Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8772250Z raise RuntimeError(msg) 2025-12-04T09:59:13.8773315Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 3. CUDA driver allocated memory was 607059968 and is now 653197312. 2025-12-04T09:59:13.8773319Z 2025-12-04T09:59:13.8773508Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8774088Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8774092Z 2025-12-04T09:59:13.8774347Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8774510Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.8774667Z ====================== 1 failed, 21 deselected in 14.21s ======================= 2025-12-04T09:59:13.8774752Z Got exit code 1 2025-12-04T09:59:13.8774852Z Retrying single test... 2025-12-04T09:59:13.8775399Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-19aef0a0802c58a7.xml 2025-12-04T09:59:13.8775542Z ============================= test session starts ============================== 2025-12-04T09:59:13.8775860Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.8775951Z cachedir: .pytest_cache 2025-12-04T09:59:13.8776510Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.8776623Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.8776893Z configfile: pytest.ini 2025-12-04T09:59:13.8777438Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.8777657Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.8778387Z stepcurrent: skipping 21 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8778545Z Running 1 items in this shard 2025-12-04T09:59:13.8778551Z 2025-12-04T09:59:13.8779568Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda I1204 09:54:53.963000 75880 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 75932 2025-12-04T09:59:13.8780076Z I1204 09:54:53.964000 75880 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 75933 2025-12-04T09:59:13.8780570Z I1204 09:54:53.965000 75880 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 75934 2025-12-04T09:59:13.8781066Z I1204 09:54:53.966000 75880 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 75935 2025-12-04T09:59:13.8782322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8782449Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8783692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8783817Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8785081Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8785231Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8786460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8786578Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8787549Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8787695Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8789843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8789940Z _warn_cpu_init() 2025-12-04T09:59:13.8790819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8790908Z fsdp_model = FSDP( 2025-12-04T09:59:13.8791763Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8791865Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8792721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8792822Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8793698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8793800Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8795600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8795698Z _warn_cpu_init() 2025-12-04T09:59:13.8797474Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8797567Z _warn_cpu_init() 2025-12-04T09:59:13.8799368Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8799485Z _warn_cpu_init() 2025-12-04T09:59:13.8800372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.8800471Z return func(*args, **kwargs) 2025-12-04T09:59:13.8801358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8801474Z fsdp_model = FSDP( 2025-12-04T09:59:13.8802358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8802453Z fsdp_model = FSDP( 2025-12-04T09:59:13.8803320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8803418Z fsdp_model = FSDP( 2025-12-04T09:59:13.8804103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8804205Z return func(*args, **kwargs) 2025-12-04T09:59:13.8804884Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8804982Z return func(*args, **kwargs) 2025-12-04T09:59:13.8805664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8805761Z return func(*args, **kwargs) 2025-12-04T09:59:13.8806468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8806561Z return func(*args, **kwargs) 2025-12-04T09:59:13.8807235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8807336Z return func(*args, **kwargs) 2025-12-04T09:59:13.8808004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8808101Z return func(*args, **kwargs) 2025-12-04T09:59:13.8808773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8808864Z return func(*args, **kwargs) 2025-12-04T09:59:13.8809539Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8809630Z return func(*args, **kwargs) 2025-12-04T09:59:13.8810040Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8810513Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8811403Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8811913Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8812791Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8813151Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8814009Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8814491Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8815346Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8815775Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8816871Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8817319Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8818293Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8818787Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8820473Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 0. CUDA driver allocated memory was 720306176 and is now 762249216. 2025-12-04T09:59:13.8821042Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8821709Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8822824Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8823190Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8823910Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8824452Z [rank0]:E1204 09:55:05.385000 75932 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.8824909Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8825438Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8826551Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8827063Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8828047Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8828448Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8829446Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8829935Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8830904Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8831388Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8832349Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8833004Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8833922Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8834601Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8836204Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 1. CUDA driver allocated memory was 607059968 and is now 653197312. 2025-12-04T09:59:13.8836556Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8837192Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8838271Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8838626Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8839327Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8839852Z [rank1]:E1204 09:55:05.388000 75933 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.8840282Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8840862Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8841843Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8842338Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8843289Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8843710Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8844637Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8845110Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8846042Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8846512Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8847541Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8847969Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8848911Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8849371Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8850916Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 2. CUDA driver allocated memory was 609157120 and is now 653197312. 2025-12-04T09:59:13.8851267Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8851881Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8852917Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8853257Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8853936Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8854475Z [rank2]:E1204 09:55:05.388000 75934 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.8855107Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8855627Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8856669Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8857360Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8858385Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8858795Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8859751Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8860237Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8861203Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8861690Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8862659Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8863129Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8864098Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8864582Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8866226Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 3. CUDA driver allocated memory was 604962816 and is now 653197312. 2025-12-04T09:59:13.8866598Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8867251Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8868359Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8868717Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8869566Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8870136Z [rank3]:E1204 09:55:05.388000 75935 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.8870239Z dist init r=1, world=4 2025-12-04T09:59:13.8870340Z dist init r=0, world=4 2025-12-04T09:59:13.8870434Z dist init r=3, world=4 2025-12-04T09:59:13.8870524Z dist init r=2, world=4 2025-12-04T09:59:13.8871654Z [rank0]:[W1204 09:55:05.426532359 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.8871781Z FAILED [13.7381s] [100%] 2025-12-04T09:59:13.8871787Z 2025-12-04T09:59:13.8871933Z =================================== FAILURES =================================== 2025-12-04T09:59:13.8872232Z ______ TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda _______ 2025-12-04T09:59:13.8872346Z Traceback (most recent call last): 2025-12-04T09:59:13.8872882Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.8872991Z self._join_processes(fn) 2025-12-04T09:59:13.8873561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.8873696Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.8874281Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.8874400Z raise RuntimeError(error) 2025-12-04T09:59:13.8874624Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.8874746Z Traceback (most recent call last): 2025-12-04T09:59:13.8875274Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8875378Z getattr(self, test_name)() 2025-12-04T09:59:13.8875929Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8876017Z fn() 2025-12-04T09:59:13.8876506Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8876612Z method(*args, **kwargs) 2025-12-04T09:59:13.8877096Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8877204Z method(*args, **kwargs) 2025-12-04T09:59:13.8877691Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8877783Z with policy(): 2025-12-04T09:59:13.8878287Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8878388Z raise RuntimeError(msg) 2025-12-04T09:59:13.8879543Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 0. CUDA driver allocated memory was 720306176 and is now 762249216. 2025-12-04T09:59:13.8879555Z 2025-12-04T09:59:13.8879762Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8880391Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8880400Z 2025-12-04T09:59:13.8880659Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8880664Z 2025-12-04T09:59:13.8880698Z 2025-12-04T09:59:13.8880936Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.8881196Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.8881967Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-19aef0a0802c58a7.xml - 2025-12-04T09:59:13.8882132Z =========================== short test summary info ============================ 2025-12-04T09:59:13.8882927Z FAILED [13.7381s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.8883075Z Traceback (most recent call last): 2025-12-04T09:59:13.8883609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8883714Z getattr(self, test_name)() 2025-12-04T09:59:13.8884237Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8884328Z fn() 2025-12-04T09:59:13.8884814Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8884914Z method(*args, **kwargs) 2025-12-04T09:59:13.8885409Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8885506Z method(*args, **kwargs) 2025-12-04T09:59:13.8886001Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8886093Z with policy(): 2025-12-04T09:59:13.8886583Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8886695Z raise RuntimeError(msg) 2025-12-04T09:59:13.8887849Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 0. CUDA driver allocated memory was 720306176 and is now 762249216. 2025-12-04T09:59:13.8887883Z 2025-12-04T09:59:13.8888099Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8888729Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8888735Z 2025-12-04T09:59:13.8888989Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8889279Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.8889452Z ====================== 1 failed, 26 deselected in 13.96s ======================= 2025-12-04T09:59:13.8889551Z Got exit code 1 2025-12-04T09:59:13.8889649Z Retrying single test... 2025-12-04T09:59:13.8890233Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5e8c70689f4db333.xml 2025-12-04T09:59:13.8890390Z ============================= test session starts ============================== 2025-12-04T09:59:13.8890716Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.8890817Z cachedir: .pytest_cache 2025-12-04T09:59:13.8891304Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.8891484Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.8896854Z configfile: pytest.ini 2025-12-04T09:59:13.8897463Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.8897685Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.8898562Z stepcurrent: skipping 21 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8898677Z Running 1 items in this shard 2025-12-04T09:59:13.8898685Z 2025-12-04T09:59:13.8899707Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda I1204 09:55:12.404000 76217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 76269 2025-12-04T09:59:13.8900213Z I1204 09:55:12.405000 76217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 76270 2025-12-04T09:59:13.8900736Z I1204 09:55:12.406000 76217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 76271 2025-12-04T09:59:13.8901226Z I1204 09:55:12.406000 76217 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 76272 2025-12-04T09:59:13.8902483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8902610Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8903853Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8903979Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8905215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8905338Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8906600Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.8906719Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.8907685Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8907807Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8908878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8909107Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8909956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8910056Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8910902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py:91: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8910998Z return fsdp_fn(module, **kwargs) 2025-12-04T09:59:13.8912830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8912944Z _warn_cpu_init() 2025-12-04T09:59:13.8914740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8914825Z _warn_cpu_init() 2025-12-04T09:59:13.8916640Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8916731Z _warn_cpu_init() 2025-12-04T09:59:13.8918502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.8918598Z _warn_cpu_init() 2025-12-04T09:59:13.8919475Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8919576Z fsdp_model = FSDP( 2025-12-04T09:59:13.8920476Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8920563Z fsdp_model = FSDP( 2025-12-04T09:59:13.8921824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8921924Z fsdp_model = FSDP( 2025-12-04T09:59:13.8922917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_fsdp.py:395: FutureWarning: The `NO_SHARD` sharding strategy is deprecated. If having issues, please use `DistributedDataParallel` instead. 2025-12-04T09:59:13.8923013Z fsdp_model = FSDP( 2025-12-04T09:59:13.8924013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.8924128Z return func(*args, **kwargs) 2025-12-04T09:59:13.8924901Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8925013Z return func(*args, **kwargs) 2025-12-04T09:59:13.8925776Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8925885Z return func(*args, **kwargs) 2025-12-04T09:59:13.8926651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8926944Z return func(*args, **kwargs) 2025-12-04T09:59:13.8927711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict will be returned. 2025-12-04T09:59:13.8927825Z return func(*args, **kwargs) 2025-12-04T09:59:13.8928580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8928693Z return func(*args, **kwargs) 2025-12-04T09:59:13.8929446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8929599Z return func(*args, **kwargs) 2025-12-04T09:59:13.8930361Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8930475Z return func(*args, **kwargs) 2025-12-04T09:59:13.8931235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py:124: UserWarning: When using ``NO_SHARD`` for ``ShardingStrategy``, full_state_dict willbe returned. 2025-12-04T09:59:13.8931340Z return func(*args, **kwargs) 2025-12-04T09:59:13.8931808Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8932337Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8933337Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8933938Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8934853Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8935215Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8936066Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8936581Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8937709Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8938200Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8939167Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8939610Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8940577Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8941141Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8942801Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 0. CUDA driver allocated memory was 716111872 and is now 762249216. 2025-12-04T09:59:13.8943166Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8943827Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8944966Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8945328Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8946052Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8946595Z [rank0]:E1204 09:55:23.866000 76269 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.8947051Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8947579Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8948683Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8949319Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8950196Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8950550Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8951396Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8951838Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8952683Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8953117Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8953973Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8954366Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8955252Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8955713Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8957173Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 2. CUDA driver allocated memory was 609157120 and is now 653197312. 2025-12-04T09:59:13.8957522Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8958099Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8959082Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8959404Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8960043Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8960523Z [rank2]:E1204 09:55:23.866000 76271 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.8960930Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8961400Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8962307Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8962764Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8963640Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8963995Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8964845Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8965282Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8966136Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8966564Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8967420Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8967883Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8968745Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8969175Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8970631Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 1. CUDA driver allocated memory was 609157120 and is now 653197312. 2025-12-04T09:59:13.8971146Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8971777Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8972817Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8973159Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8973839Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8974348Z [rank1]:E1204 09:55:23.866000 76270 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.8974772Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.8975308Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.8976248Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8976978Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.8977968Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8978373Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.8979332Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8979819Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8980787Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8981272Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.8982279Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8982755Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.8983722Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8984211Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.8985890Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 3. CUDA driver allocated memory was 604962816 and is now 653197312. 2025-12-04T09:59:13.8986266Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8986928Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8988037Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8988398Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.8989324Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8989839Z [rank3]:E1204 09:55:23.868000 76272 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.8989930Z dist init r=3, world=4 2025-12-04T09:59:13.8990056Z dist init r=2, world=4 2025-12-04T09:59:13.8990147Z dist init r=0, world=4 2025-12-04T09:59:13.8990242Z dist init r=1, world=4 2025-12-04T09:59:13.8991418Z [rank0]:[W1204 09:55:24.910910865 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.8991511Z FAILED [14.0231s] [100%] 2025-12-04T09:59:13.8991517Z 2025-12-04T09:59:13.8991656Z =================================== FAILURES =================================== 2025-12-04T09:59:13.8991923Z ______ TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda _______ 2025-12-04T09:59:13.8992031Z Traceback (most recent call last): 2025-12-04T09:59:13.8992522Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.8992617Z self._join_processes(fn) 2025-12-04T09:59:13.8993141Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.8993262Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.8993799Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.8993905Z raise RuntimeError(error) 2025-12-04T09:59:13.8994117Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.8994227Z Traceback (most recent call last): 2025-12-04T09:59:13.8994702Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.8994858Z getattr(self, test_name)() 2025-12-04T09:59:13.8995327Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.8995413Z fn() 2025-12-04T09:59:13.8995857Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8995947Z method(*args, **kwargs) 2025-12-04T09:59:13.8996394Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.8996485Z method(*args, **kwargs) 2025-12-04T09:59:13.8996966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.8997051Z with policy(): 2025-12-04T09:59:13.8997501Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.8997605Z raise RuntimeError(msg) 2025-12-04T09:59:13.8998665Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 0. CUDA driver allocated memory was 716111872 and is now 762249216. 2025-12-04T09:59:13.8998671Z 2025-12-04T09:59:13.8998865Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.8999447Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.8999454Z 2025-12-04T09:59:13.8999690Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.8999695Z 2025-12-04T09:59:13.8999842Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.8999948Z Traceback (most recent call last): 2025-12-04T09:59:13.9000435Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9000532Z getattr(self, test_name)() 2025-12-04T09:59:13.9001031Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9001116Z fn() 2025-12-04T09:59:13.9001561Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9001652Z method(*args, **kwargs) 2025-12-04T09:59:13.9002098Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9002191Z method(*args, **kwargs) 2025-12-04T09:59:13.9002643Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9002729Z with policy(): 2025-12-04T09:59:13.9003179Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9003280Z raise RuntimeError(msg) 2025-12-04T09:59:13.9004338Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 3. CUDA driver allocated memory was 604962816 and is now 653197312. 2025-12-04T09:59:13.9004343Z 2025-12-04T09:59:13.9004536Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9005105Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.9005112Z 2025-12-04T09:59:13.9005344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9005380Z 2025-12-04T09:59:13.9005384Z 2025-12-04T09:59:13.9005605Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.9005836Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.9006553Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5e8c70689f4db333.xml - 2025-12-04T09:59:13.9006707Z =========================== short test summary info ============================ 2025-12-04T09:59:13.9007447Z FAILED [14.0231s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.9007579Z Traceback (most recent call last): 2025-12-04T09:59:13.9008060Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9008164Z getattr(self, test_name)() 2025-12-04T09:59:13.9008638Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9008716Z fn() 2025-12-04T09:59:13.9009173Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9009263Z method(*args, **kwargs) 2025-12-04T09:59:13.9009710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9009799Z method(*args, **kwargs) 2025-12-04T09:59:13.9010248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9010339Z with policy(): 2025-12-04T09:59:13.9010787Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9010883Z raise RuntimeError(msg) 2025-12-04T09:59:13.9011975Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 0. CUDA driver allocated memory was 716111872 and is now 762249216. 2025-12-04T09:59:13.9011981Z 2025-12-04T09:59:13.9012169Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9012750Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.9012756Z 2025-12-04T09:59:13.9012985Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9012992Z 2025-12-04T09:59:13.9013142Z Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.9013247Z Traceback (most recent call last): 2025-12-04T09:59:13.9013730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9013834Z getattr(self, test_name)() 2025-12-04T09:59:13.9014311Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9014387Z fn() 2025-12-04T09:59:13.9014836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9014927Z method(*args, **kwargs) 2025-12-04T09:59:13.9015377Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9015467Z method(*args, **kwargs) 2025-12-04T09:59:13.9015912Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9016000Z with policy(): 2025-12-04T09:59:13.9016609Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9016883Z raise RuntimeError(msg) 2025-12-04T09:59:13.9018101Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda! Caching allocator allocated memory was 512 and is now reported as 1091072 on device 3. CUDA driver allocated memory was 604962816 and is now 653197312. 2025-12-04T09:59:13.9018106Z 2025-12-04T09:59:13.9018315Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9018966Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.9019005Z 2025-12-04T09:59:13.9019270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9019454Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.9019641Z ====================== 1 failed, 26 deselected in 14.24s ======================= 2025-12-04T09:59:13.9019734Z Got exit code 1 2025-12-04T09:59:13.9020313Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda 2025-12-04T09:59:13.9020715Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.9021547Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-389219a70e101b44.xml 2025-12-04T09:59:13.9021714Z ============================= test session starts ============================== 2025-12-04T09:59:13.9022064Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.9022178Z cachedir: .pytest_cache 2025-12-04T09:59:13.9022693Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.9022819Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.9022929Z configfile: pytest.ini 2025-12-04T09:59:13.9023531Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.9023755Z collecting ... collected 60 items / 22 deselected / 38 selected 2025-12-04T09:59:13.9023892Z stepcurrent: skipping 22 already run items. 2025-12-04T09:59:13.9023999Z Running 5 items in this shard 2025-12-04T09:59:13.9024004Z 2025-12-04T09:59:13.9025014Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda I1204 09:55:30.994000 76554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 76606 2025-12-04T09:59:13.9025511Z I1204 09:55:30.994000 76554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 76607 2025-12-04T09:59:13.9026012Z I1204 09:55:30.995000 76554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 76608 2025-12-04T09:59:13.9026503Z I1204 09:55:30.996000 76554 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 76609 2025-12-04T09:59:13.9027747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9027882Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9029112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9029241Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9030548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9030671Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9031902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9032023Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9034253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.9034347Z _warn_cpu_init() 2025-12-04T09:59:13.9036265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.9036354Z _warn_cpu_init() 2025-12-04T09:59:13.9038407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.9038536Z _warn_cpu_init() 2025-12-04T09:59:13.9040428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.9040518Z _warn_cpu_init() 2025-12-04T09:59:13.9041453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.9041562Z return func(*args, **kwargs) 2025-12-04T09:59:13.9041992Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9042499Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9043440Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9043918Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9044878Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9045280Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9046185Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9046638Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9047571Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9048029Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9048932Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9049355Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9050255Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9050725Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9052581Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 716111872 and is now 737083392. 2025-12-04T09:59:13.9052937Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9053635Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9054596Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9054919Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9055553Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9056041Z [rank0]:E1204 09:55:43.816000 76606 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.9056519Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9057206Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9058213Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9058786Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9059784Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9060175Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9061141Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9061654Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9062621Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9063106Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9064065Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9064515Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9065476Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9065975Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9067636Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:59:13.9068007Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9068660Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9069767Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9070099Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9070735Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9071223Z [rank2]:E1204 09:55:43.817000 76608 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.9071621Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9072105Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9073017Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9073491Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9074372Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9074723Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9075576Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9076034Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9076886Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9077315Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9078164Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9078564Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9079417Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9079856Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9081315Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:59:13.9081640Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9082235Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9083189Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9083513Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9084144Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9084623Z [rank1]:E1204 09:55:43.817000 76607 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.9085024Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9085517Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9086436Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9086882Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9087760Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9088137Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9088985Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9089422Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9090269Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9090705Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9091551Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9091948Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9092838Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9093272Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9094709Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T09:59:13.9095033Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9095626Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9096651Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9097190Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9097905Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9098452Z [rank3]:E1204 09:55:43.819000 76609 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.9098596Z dist init r=3, world=4 2025-12-04T09:59:13.9098692Z dist init r=1, world=4 2025-12-04T09:59:13.9098815Z dist init r=0, world=4 2025-12-04T09:59:13.9098917Z dist init r=2, world=4 2025-12-04T09:59:13.9100069Z [rank0]:[W1204 09:55:44.851114529 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.9100174Z FAILED [14.3450s] [ 20%] 2025-12-04T09:59:13.9100180Z 2025-12-04T09:59:13.9100325Z =================================== FAILURES =================================== 2025-12-04T09:59:13.9100619Z ________ TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda _________ 2025-12-04T09:59:13.9100770Z Traceback (most recent call last): 2025-12-04T09:59:13.9101321Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.9101435Z self._join_processes(fn) 2025-12-04T09:59:13.9102019Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.9102161Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.9102772Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.9102881Z raise RuntimeError(error) 2025-12-04T09:59:13.9103114Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.9103235Z Traceback (most recent call last): 2025-12-04T09:59:13.9103773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9103887Z getattr(self, test_name)() 2025-12-04T09:59:13.9104416Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9104502Z fn() 2025-12-04T09:59:13.9105013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9105112Z method(*args, **kwargs) 2025-12-04T09:59:13.9105639Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9105743Z method(*args, **kwargs) 2025-12-04T09:59:13.9106247Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9106344Z with policy(): 2025-12-04T09:59:13.9106851Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9106958Z raise RuntimeError(msg) 2025-12-04T09:59:13.9108135Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T09:59:13.9108144Z 2025-12-04T09:59:13.9108355Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9109190Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9109197Z 2025-12-04T09:59:13.9109432Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9109437Z 2025-12-04T09:59:13.9109441Z 2025-12-04T09:59:13.9109639Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.9109868Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.9110567Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-389219a70e101b44.xml - 2025-12-04T09:59:13.9110774Z =========================== short test summary info ============================ 2025-12-04T09:59:13.9111486Z FAILED [14.3450s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.9111596Z Traceback (most recent call last): 2025-12-04T09:59:13.9112081Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9112178Z getattr(self, test_name)() 2025-12-04T09:59:13.9112656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9112770Z fn() 2025-12-04T09:59:13.9113214Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9113312Z method(*args, **kwargs) 2025-12-04T09:59:13.9113759Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9113855Z method(*args, **kwargs) 2025-12-04T09:59:13.9114298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9114379Z with policy(): 2025-12-04T09:59:13.9114830Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9114922Z raise RuntimeError(msg) 2025-12-04T09:59:13.9115955Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T09:59:13.9115970Z 2025-12-04T09:59:13.9116155Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9116714Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9116719Z 2025-12-04T09:59:13.9116980Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9117136Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.9117297Z ====================== 1 failed, 22 deselected in 14.56s ======================= 2025-12-04T09:59:13.9117377Z Got exit code 1 2025-12-04T09:59:13.9117466Z Retrying single test... 2025-12-04T09:59:13.9118019Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22aad73f608511a0.xml 2025-12-04T09:59:13.9118160Z ============================= test session starts ============================== 2025-12-04T09:59:13.9118470Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.9118567Z cachedir: .pytest_cache 2025-12-04T09:59:13.9119018Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.9119132Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.9119222Z configfile: pytest.ini 2025-12-04T09:59:13.9119695Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.9119889Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.9120513Z stepcurrent: skipping 22 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9120613Z Running 1 items in this shard 2025-12-04T09:59:13.9120622Z 2025-12-04T09:59:13.9121976Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda I1204 09:55:50.154000 76891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 76943 2025-12-04T09:59:13.9122521Z I1204 09:55:50.154000 76891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 76944 2025-12-04T09:59:13.9123019Z I1204 09:55:50.155000 76891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 76945 2025-12-04T09:59:13.9123505Z I1204 09:55:50.156000 76891 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 76946 2025-12-04T09:59:13.9124754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9124916Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9126151Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9126282Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9127511Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9127639Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9128858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9128989Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9131050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.9131146Z _warn_cpu_init() 2025-12-04T09:59:13.9133170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.9133269Z _warn_cpu_init() 2025-12-04T09:59:13.9135206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.9135293Z _warn_cpu_init() 2025-12-04T09:59:13.9137472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.9137598Z _warn_cpu_init() 2025-12-04T09:59:13.9138602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.9138710Z return func(*args, **kwargs) 2025-12-04T09:59:13.9139165Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9139729Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9140730Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9141242Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9142225Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9142625Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9143585Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9144073Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9145063Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9145549Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9146508Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9146955Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9147928Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9148424Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9150033Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392. 2025-12-04T09:59:13.9150363Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9150944Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9151979Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9152302Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9152938Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9153420Z [rank0]:E1204 09:56:02.194000 76943 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.9153839Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9154311Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9155197Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9155649Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9156519Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9156873Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9157725Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9158155Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9159035Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9159464Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9160312Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9160704Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9161556Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9161996Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9163431Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 609157120 and is now 628031488. 2025-12-04T09:59:13.9163758Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9164402Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9165369Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9165690Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9166327Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9166834Z [rank1]:E1204 09:56:02.197000 76944 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.9167235Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9167710Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9168592Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9169044Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9169916Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9170265Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9171129Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9171585Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9172443Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9172870Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9173726Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9174120Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9174969Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9175407Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9177119Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T09:59:13.9177565Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9178224Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9179308Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9179671Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9180425Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9180969Z [rank3]:E1204 09:56:02.197000 76946 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.9181419Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9181952Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9182946Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9183456Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9184442Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9184835Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9185827Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9186312Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9187272Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9187757Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9188725Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9189252Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9190101Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9190540Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9192009Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:59:13.9192550Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9193168Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9194190Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9194558Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9195236Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9195754Z [rank2]:E1204 09:56:02.198000 76945 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.9195848Z dist init r=0, world=4 2025-12-04T09:59:13.9195944Z dist init r=1, world=4 2025-12-04T09:59:13.9196031Z dist init r=2, world=4 2025-12-04T09:59:13.9196118Z dist init r=3, world=4 2025-12-04T09:59:13.9197214Z [rank0]:[W1204 09:56:02.216891347 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.9197308Z FAILED [14.4971s] [100%] 2025-12-04T09:59:13.9197313Z 2025-12-04T09:59:13.9197454Z =================================== FAILURES =================================== 2025-12-04T09:59:13.9197734Z ________ TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda _________ 2025-12-04T09:59:13.9197844Z Traceback (most recent call last): 2025-12-04T09:59:13.9198393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.9198495Z self._join_processes(fn) 2025-12-04T09:59:13.9199040Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.9199177Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.9199739Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.9199849Z raise RuntimeError(error) 2025-12-04T09:59:13.9200065Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.9200179Z Traceback (most recent call last): 2025-12-04T09:59:13.9200878Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9200988Z getattr(self, test_name)() 2025-12-04T09:59:13.9201508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9201594Z fn() 2025-12-04T09:59:13.9202082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9202186Z method(*args, **kwargs) 2025-12-04T09:59:13.9202675Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9202773Z method(*args, **kwargs) 2025-12-04T09:59:13.9203264Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9203385Z with policy(): 2025-12-04T09:59:13.9203907Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9204008Z raise RuntimeError(msg) 2025-12-04T09:59:13.9205142Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392. 2025-12-04T09:59:13.9205149Z 2025-12-04T09:59:13.9205358Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9205971Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9206004Z 2025-12-04T09:59:13.9206263Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9206268Z 2025-12-04T09:59:13.9206424Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.9206538Z Traceback (most recent call last): 2025-12-04T09:59:13.9207073Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9207178Z getattr(self, test_name)() 2025-12-04T09:59:13.9207699Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9207781Z fn() 2025-12-04T09:59:13.9208269Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9208369Z method(*args, **kwargs) 2025-12-04T09:59:13.9208855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9208951Z method(*args, **kwargs) 2025-12-04T09:59:13.9209442Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9209536Z with policy(): 2025-12-04T09:59:13.9210035Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9210163Z raise RuntimeError(msg) 2025-12-04T09:59:13.9211298Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:59:13.9211304Z 2025-12-04T09:59:13.9211515Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9212122Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9212127Z 2025-12-04T09:59:13.9212389Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9212396Z 2025-12-04T09:59:13.9212400Z 2025-12-04T09:59:13.9212608Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.9212860Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.9213634Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22aad73f608511a0.xml - 2025-12-04T09:59:13.9213795Z =========================== short test summary info ============================ 2025-12-04T09:59:13.9214577Z FAILED [14.4971s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.9214693Z Traceback (most recent call last): 2025-12-04T09:59:13.9215223Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9215401Z getattr(self, test_name)() 2025-12-04T09:59:13.9215919Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9216008Z fn() 2025-12-04T09:59:13.9216589Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9216861Z method(*args, **kwargs) 2025-12-04T09:59:13.9217375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9217525Z method(*args, **kwargs) 2025-12-04T09:59:13.9218067Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9218159Z with policy(): 2025-12-04T09:59:13.9218665Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9218782Z raise RuntimeError(msg) 2025-12-04T09:59:13.9219954Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392. 2025-12-04T09:59:13.9219960Z 2025-12-04T09:59:13.9220179Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9221014Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9221027Z 2025-12-04T09:59:13.9221299Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9221304Z 2025-12-04T09:59:13.9221473Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.9221594Z Traceback (most recent call last): 2025-12-04T09:59:13.9222149Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9222258Z getattr(self, test_name)() 2025-12-04T09:59:13.9222854Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9222947Z fn() 2025-12-04T09:59:13.9223458Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9223560Z method(*args, **kwargs) 2025-12-04T09:59:13.9224075Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9224179Z method(*args, **kwargs) 2025-12-04T09:59:13.9224687Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9224784Z with policy(): 2025-12-04T09:59:13.9225293Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9225406Z raise RuntimeError(msg) 2025-12-04T09:59:13.9226575Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 607059968 and is now 628031488. 2025-12-04T09:59:13.9226581Z 2025-12-04T09:59:13.9226802Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9227432Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9227440Z 2025-12-04T09:59:13.9227699Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9227920Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.9228130Z ====================== 1 failed, 26 deselected in 14.72s ======================= 2025-12-04T09:59:13.9228230Z Got exit code 1 2025-12-04T09:59:13.9228332Z Retrying single test... 2025-12-04T09:59:13.9228957Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22bb81621d944803.xml 2025-12-04T09:59:13.9229122Z ============================= test session starts ============================== 2025-12-04T09:59:13.9229468Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.9229574Z cachedir: .pytest_cache 2025-12-04T09:59:13.9230133Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.9230252Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.9230360Z configfile: pytest.ini 2025-12-04T09:59:13.9230902Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.9231115Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.9231838Z stepcurrent: skipping 22 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9231948Z Running 1 items in this shard 2025-12-04T09:59:13.9231953Z 2025-12-04T09:59:13.9233163Z distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda I1204 09:56:09.104000 77228 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 77280 2025-12-04T09:59:13.9233634Z I1204 09:56:09.105000 77228 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 77281 2025-12-04T09:59:13.9234096Z I1204 09:56:09.105000 77228 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 77282 2025-12-04T09:59:13.9234566Z I1204 09:56:09.106000 77228 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 77283 2025-12-04T09:59:13.9235774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9235901Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9237072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9237188Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9238349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9238466Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9239629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9239742Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9241676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.9241794Z _warn_cpu_init() 2025-12-04T09:59:13.9243889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.9244013Z _warn_cpu_init() 2025-12-04T09:59:13.9245981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.9246082Z _warn_cpu_init() 2025-12-04T09:59:13.9248018Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:1014: UserWarning: The passed-in `module` is on CPU and will thus have FSDP's sharding initialization run on CPU, which may be slower than on GPU. We recommend passing in the `device_id` argument for FSDP to move `module` to GPU for the sharding initialization. `module` must also be on GPU device to work with the `sync_module_states=True` flag since that requires GPU communication. 2025-12-04T09:59:13.9248115Z _warn_cpu_init() 2025-12-04T09:59:13.9249074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:13.9249187Z return func(*args, **kwargs) 2025-12-04T09:59:13.9249630Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9250175Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9251144Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9251636Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9252599Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9252982Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9253915Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9254488Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9255392Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9255850Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9257076Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9257532Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9258493Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9258986Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9260643Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392. 2025-12-04T09:59:13.9261010Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9261672Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9262745Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9263112Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9263833Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9264421Z [rank0]:E1204 09:56:21.828000 77280 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.9264870Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9265397Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9266400Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9266904Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9267898Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9268292Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9269319Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9269747Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9270623Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9271084Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9271932Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9272330Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9273181Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9273648Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9275093Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 2. CUDA driver allocated memory was 611254272 and is now 628031488. 2025-12-04T09:59:13.9275414Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9276005Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9276959Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9277287Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9277944Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9278428Z [rank2]:E1204 09:56:21.829000 77282 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.9278824Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9279287Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9280184Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9280633Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9281510Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9281858Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9282708Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9283150Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9284047Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9284487Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9285341Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9285738Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9286625Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9287064Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9288505Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 1. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T09:59:13.9288830Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9289414Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9290369Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9290724Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9291359Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9291842Z [rank1]:E1204 09:56:21.830000 77281 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.9292241Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9292709Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9293605Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9294053Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9294934Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9295281Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9296132Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9296862Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9297839Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9298329Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9299290Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9299766Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9300731Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9301222Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9302844Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 3. CUDA driver allocated memory was 604962816 and is now 628031488. 2025-12-04T09:59:13.9303204Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9303867Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9304973Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9305341Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9306050Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9306593Z [rank3]:E1204 09:56:21.830000 77283 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.9306697Z dist init r=3, world=4 2025-12-04T09:59:13.9306796Z dist init r=0, world=4 2025-12-04T09:59:13.9306896Z dist init r=1, world=4 2025-12-04T09:59:13.9306990Z dist init r=2, world=4 2025-12-04T09:59:13.9308145Z [rank0]:[W1204 09:56:22.856180214 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:13.9308252Z FAILED [14.4992s] [100%] 2025-12-04T09:59:13.9308257Z 2025-12-04T09:59:13.9308402Z =================================== FAILURES =================================== 2025-12-04T09:59:13.9308816Z ________ TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda _________ 2025-12-04T09:59:13.9308932Z Traceback (most recent call last): 2025-12-04T09:59:13.9309538Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.9309640Z self._join_processes(fn) 2025-12-04T09:59:13.9310231Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.9310355Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.9310897Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.9310993Z raise RuntimeError(error) 2025-12-04T09:59:13.9311203Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.9311305Z Traceback (most recent call last): 2025-12-04T09:59:13.9311783Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9311928Z getattr(self, test_name)() 2025-12-04T09:59:13.9312399Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9312477Z fn() 2025-12-04T09:59:13.9312926Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9313016Z method(*args, **kwargs) 2025-12-04T09:59:13.9313474Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9313565Z method(*args, **kwargs) 2025-12-04T09:59:13.9314008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9314096Z with policy(): 2025-12-04T09:59:13.9314545Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9314640Z raise RuntimeError(msg) 2025-12-04T09:59:13.9315683Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392. 2025-12-04T09:59:13.9315691Z 2025-12-04T09:59:13.9315879Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9316466Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9316471Z 2025-12-04T09:59:13.9316704Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9316708Z 2025-12-04T09:59:13.9316713Z 2025-12-04T09:59:13.9316908Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.9317136Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.9317839Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22bb81621d944803.xml - 2025-12-04T09:59:13.9317996Z =========================== short test summary info ============================ 2025-12-04T09:59:13.9318708Z FAILED [14.4992s] distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.9318825Z Traceback (most recent call last): 2025-12-04T09:59:13.9319309Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9319406Z getattr(self, test_name)() 2025-12-04T09:59:13.9319888Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9319967Z fn() 2025-12-04T09:59:13.9320420Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9320511Z method(*args, **kwargs) 2025-12-04T09:59:13.9321331Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9321451Z method(*args, **kwargs) 2025-12-04T09:59:13.9321966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9322061Z with policy(): 2025-12-04T09:59:13.9322579Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9322687Z raise RuntimeError(msg) 2025-12-04T09:59:13.9323865Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda! Caching allocator allocated memory was 512 and is now reported as 80384 on device 0. CUDA driver allocated memory was 718209024 and is now 737083392. 2025-12-04T09:59:13.9323925Z 2025-12-04T09:59:13.9324137Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9324765Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParityWithDDPCUDA.test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9324776Z 2025-12-04T09:59:13.9325043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9325220Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.9325407Z ====================== 1 failed, 26 deselected in 14.71s ======================= 2025-12-04T09:59:13.9325500Z Got exit code 1 2025-12-04T09:59:13.9326049Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda 2025-12-04T09:59:13.9326465Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.9327089Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e70588b2995dc7c5.xml 2025-12-04T09:59:13.9327261Z ============================= test session starts ============================== 2025-12-04T09:59:13.9327605Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.9327708Z cachedir: .pytest_cache 2025-12-04T09:59:13.9328264Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.9328385Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.9328486Z configfile: pytest.ini 2025-12-04T09:59:13.9329032Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.9329246Z collecting ... collected 60 items / 23 deselected / 37 selected 2025-12-04T09:59:13.9329393Z stepcurrent: skipping 23 already run items. 2025-12-04T09:59:13.9329503Z Running 4 items in this shard 2025-12-04T09:59:13.9329511Z 2025-12-04T09:59:13.9330526Z distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda I1204 09:56:28.384000 77565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 77617 2025-12-04T09:59:13.9331037Z I1204 09:56:28.385000 77565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 77618 2025-12-04T09:59:13.9331526Z I1204 09:56:28.386000 77565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 77619 2025-12-04T09:59:13.9332017Z I1204 09:56:28.386000 77565 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 77620 2025-12-04T09:59:13.9333254Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9333426Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9335195Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9335355Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9336605Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9336940Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9338663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9338828Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9340064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9340185Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9341898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9342072Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9343391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9343519Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9345226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9345398Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9345855Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9346391Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9347396Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9347903Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9349145Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9349550Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9350462Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9350917Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9351819Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9352306Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9353208Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9353635Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9354537Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9355095Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9356558Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 718209024 and is now 745472000. 2025-12-04T09:59:13.9356905Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9357491Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9358457Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9358786Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9359423Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9359913Z [rank0]:E1204 09:56:35.337000 77617 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.9360310Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9360780Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9361670Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9362117Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9363065Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9363415Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9364271Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9364701Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9365575Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9366009Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9366856Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9367255Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9368109Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9368548Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9370010Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 611254272 and is now 636420096. 2025-12-04T09:59:13.9370335Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9370924Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9371899Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9372225Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9372857Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9373347Z [rank1]:E1204 09:56:35.338000 77618 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.9373744Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9374213Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9375126Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9375597Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9376569Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9377125Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9378104Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9378627Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9379584Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9380079Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9381030Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9381478Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9382444Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9382942Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9384596Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 607059968 and is now 636420096. 2025-12-04T09:59:13.9384959Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9385624Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9386725Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9387091Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9387803Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9388346Z [rank2]:E1204 09:56:35.338000 77619 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.9388906Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9389584Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9390589Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9391065Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9391993Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9392694Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9393609Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9394067Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9394965Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9395426Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9396325Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9396753Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9397657Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9398156Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9399679Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 604962816 and is now 636420096. 2025-12-04T09:59:13.9400020Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9400646Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9401746Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9402077Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9402708Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9403191Z [rank3]:E1204 09:56:35.342000 77620 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.9403281Z dist init r=1, world=4 2025-12-04T09:59:13.9403396Z dist init r=0, world=4 2025-12-04T09:59:13.9403511Z dist init r=3, world=4 2025-12-04T09:59:13.9403593Z dist init r=2, world=4 2025-12-04T09:59:13.9403675Z FAILED [8.7935s] [ 25%] 2025-12-04T09:59:13.9403680Z 2025-12-04T09:59:13.9403817Z =================================== FAILURES =================================== 2025-12-04T09:59:13.9404075Z ______ TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda ______ 2025-12-04T09:59:13.9404181Z Traceback (most recent call last): 2025-12-04T09:59:13.9404673Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.9404772Z self._join_processes(fn) 2025-12-04T09:59:13.9405341Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.9405467Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.9406005Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.9406112Z raise RuntimeError(error) 2025-12-04T09:59:13.9406316Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.9406429Z Traceback (most recent call last): 2025-12-04T09:59:13.9406908Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9407007Z getattr(self, test_name)() 2025-12-04T09:59:13.9407482Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9407560Z fn() 2025-12-04T09:59:13.9408010Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9408107Z method(*args, **kwargs) 2025-12-04T09:59:13.9408552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9408648Z method(*args, **kwargs) 2025-12-04T09:59:13.9409097Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9409204Z with policy(): 2025-12-04T09:59:13.9409662Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9409755Z raise RuntimeError(msg) 2025-12-04T09:59:13.9410798Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 718209024 and is now 745472000. 2025-12-04T09:59:13.9410811Z 2025-12-04T09:59:13.9411002Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9411572Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9411577Z 2025-12-04T09:59:13.9411814Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9411821Z 2025-12-04T09:59:13.9411825Z 2025-12-04T09:59:13.9412017Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.9412251Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.9412958Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e70588b2995dc7c5.xml - 2025-12-04T09:59:13.9413107Z =========================== short test summary info ============================ 2025-12-04T09:59:13.9413828Z FAILED [8.7935s] distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.9413996Z Traceback (most recent call last): 2025-12-04T09:59:13.9414496Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9414594Z getattr(self, test_name)() 2025-12-04T09:59:13.9415069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9415149Z fn() 2025-12-04T09:59:13.9415597Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9415693Z method(*args, **kwargs) 2025-12-04T09:59:13.9416167Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9416256Z method(*args, **kwargs) 2025-12-04T09:59:13.9416967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9417068Z with policy(): 2025-12-04T09:59:13.9417576Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9417693Z raise RuntimeError(msg) 2025-12-04T09:59:13.9418869Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 718209024 and is now 745472000. 2025-12-04T09:59:13.9418875Z 2025-12-04T09:59:13.9419098Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9419751Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9419757Z 2025-12-04T09:59:13.9420023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9420205Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.9420376Z ======================= 1 failed, 23 deselected in 9.01s ======================= 2025-12-04T09:59:13.9420516Z Got exit code 1 2025-12-04T09:59:13.9420618Z Retrying single test... 2025-12-04T09:59:13.9421456Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b456a18c8ca9135a.xml 2025-12-04T09:59:13.9421629Z ============================= test session starts ============================== 2025-12-04T09:59:13.9421976Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.9422093Z cachedir: .pytest_cache 2025-12-04T09:59:13.9422610Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.9422738Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.9422850Z configfile: pytest.ini 2025-12-04T09:59:13.9423381Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.9423598Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.9424330Z stepcurrent: skipping 23 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9424440Z Running 1 items in this shard 2025-12-04T09:59:13.9424446Z 2025-12-04T09:59:13.9425466Z distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda I1204 09:56:41.904000 77894 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 77946 2025-12-04T09:59:13.9425967Z I1204 09:56:41.905000 77894 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 77947 2025-12-04T09:59:13.9426567Z I1204 09:56:41.906000 77894 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 77948 2025-12-04T09:59:13.9427071Z I1204 09:56:41.906000 77894 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 77949 2025-12-04T09:59:13.9428327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9428459Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9430224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9430398Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9431629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9431755Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9433512Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9433667Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9434799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9434908Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9436437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9436581Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9437675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9437785Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9439299Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9439451Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9439858Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9440373Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9441297Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9441746Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9442623Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9443001Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9443860Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9444291Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9445152Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9445578Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9446423Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9446821Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9447693Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9448130Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9449582Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 720306176 and is now 745472000. 2025-12-04T09:59:13.9449914Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9450498Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9451473Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9451793Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9452424Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9452909Z [rank0]:E1204 09:56:48.846000 77946 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.9453364Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9453836Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9454720Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9455165Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9456077Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9456502Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9457625Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9458113Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9459081Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9459566Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9460523Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9460971Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9461987Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9462483Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9464116Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 609157120 and is now 636420096. 2025-12-04T09:59:13.9464488Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9465148Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9466245Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9466600Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9467314Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9467916Z [rank1]:E1204 09:56:48.848000 77947 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.9468372Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9469015Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9469905Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9470381Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9471263Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9471611Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9472467Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9472900Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9473756Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9474184Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9475056Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9475457Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9476306Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9476747Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9478194Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 604962816 and is now 636420096. 2025-12-04T09:59:13.9478526Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9479111Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9480086Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9480632Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9481361Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9481880Z [rank2]:E1204 09:56:48.848000 77948 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.9482304Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9482811Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9483753Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9484258Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9485201Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9485573Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9486479Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9487122Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9488055Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9488527Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9489500Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9489933Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9490875Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9491356Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9493024Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 607059968 and is now 636420096. 2025-12-04T09:59:13.9493370Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9493990Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9495058Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9495432Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9496101Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9496860Z [rank3]:E1204 09:56:48.848000 77949 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.9496970Z dist init r=1, world=4 2025-12-04T09:59:13.9497070Z dist init r=3, world=4 2025-12-04T09:59:13.9497176Z dist init r=2, world=4 2025-12-04T09:59:13.9497308Z dist init r=0, world=4 2025-12-04T09:59:13.9497407Z FAILED [8.6697s] [100%] 2025-12-04T09:59:13.9497413Z 2025-12-04T09:59:13.9497558Z =================================== FAILURES =================================== 2025-12-04T09:59:13.9497858Z ______ TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda ______ 2025-12-04T09:59:13.9497987Z Traceback (most recent call last): 2025-12-04T09:59:13.9498534Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.9498642Z self._join_processes(fn) 2025-12-04T09:59:13.9499229Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.9499365Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.9499974Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.9500086Z raise RuntimeError(error) 2025-12-04T09:59:13.9500320Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.9500442Z Traceback (most recent call last): 2025-12-04T09:59:13.9500983Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9501090Z getattr(self, test_name)() 2025-12-04T09:59:13.9501655Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9501740Z fn() 2025-12-04T09:59:13.9502253Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9502354Z method(*args, **kwargs) 2025-12-04T09:59:13.9502855Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9502962Z method(*args, **kwargs) 2025-12-04T09:59:13.9503460Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9503561Z with policy(): 2025-12-04T09:59:13.9504071Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9504176Z raise RuntimeError(msg) 2025-12-04T09:59:13.9505363Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 604962816 and is now 636420096. 2025-12-04T09:59:13.9505369Z 2025-12-04T09:59:13.9505583Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9506237Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9506245Z 2025-12-04T09:59:13.9506509Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9506516Z 2025-12-04T09:59:13.9506549Z 2025-12-04T09:59:13.9506793Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.9507060Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.9507872Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b456a18c8ca9135a.xml - 2025-12-04T09:59:13.9508044Z =========================== short test summary info ============================ 2025-12-04T09:59:13.9508935Z FAILED [8.6697s] distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.9509088Z Traceback (most recent call last): 2025-12-04T09:59:13.9509579Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9509677Z getattr(self, test_name)() 2025-12-04T09:59:13.9510500Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9510577Z fn() 2025-12-04T09:59:13.9511023Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9511122Z method(*args, **kwargs) 2025-12-04T09:59:13.9511566Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9511662Z method(*args, **kwargs) 2025-12-04T09:59:13.9512105Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9512192Z with policy(): 2025-12-04T09:59:13.9512656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9512750Z raise RuntimeError(msg) 2025-12-04T09:59:13.9513792Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 604962816 and is now 636420096. 2025-12-04T09:59:13.9513996Z 2025-12-04T09:59:13.9514205Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9514810Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9514815Z 2025-12-04T09:59:13.9515067Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9515235Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.9515396Z ======================= 1 failed, 26 deselected in 8.89s ======================= 2025-12-04T09:59:13.9515491Z Got exit code 1 2025-12-04T09:59:13.9515589Z Retrying single test... 2025-12-04T09:59:13.9516182Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aedba904eee3ba73.xml 2025-12-04T09:59:13.9516335Z ============================= test session starts ============================== 2025-12-04T09:59:13.9516660Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.9516761Z cachedir: .pytest_cache 2025-12-04T09:59:13.9517246Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.9517364Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.9517461Z configfile: pytest.ini 2025-12-04T09:59:13.9517964Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.9518172Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.9518910Z stepcurrent: skipping 23 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9519013Z Running 1 items in this shard 2025-12-04T09:59:13.9519018Z 2025-12-04T09:59:13.9519967Z distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda I1204 09:56:55.434000 78223 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 78275 2025-12-04T09:59:13.9520436Z I1204 09:56:55.435000 78223 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 78276 2025-12-04T09:59:13.9521053Z I1204 09:56:55.435000 78223 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 78277 2025-12-04T09:59:13.9521808Z I1204 09:56:55.436000 78223 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 78278 2025-12-04T09:59:13.9523079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9523207Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9524931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9525106Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9526343Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9526479Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9527781Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9527911Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9529140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9529264Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9530992Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9531160Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9532871Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9533033Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9534909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9535102Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9535549Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9536076Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9537336Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9537859Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9538847Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9539257Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9540224Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9540710Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9541685Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9542204Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9543176Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9543618Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9544584Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9545071Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9546703Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 2. CUDA driver allocated memory was 607059968 and is now 636420096. 2025-12-04T09:59:13.9547072Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9547729Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9549148Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9549514Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9550195Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9550703Z [rank2]:E1204 09:57:02.344000 78277 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.9551123Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9551726Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9552757Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9553218Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9554092Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9554447Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9555296Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9555729Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9556617Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9557048Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9557899Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9558295Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9559158Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9559594Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9561036Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 0. CUDA driver allocated memory was 720306176 and is now 745472000. 2025-12-04T09:59:13.9561365Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9561973Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9562991Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9563308Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9563951Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9564432Z [rank0]:E1204 09:57:02.344000 78275 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.9564854Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9565332Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9566217Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9566671Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9567544Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9567901Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9568758Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9569210Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9570073Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9570504Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9571364Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9571762Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9572622Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9573052Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9574502Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 607059968 and is now 636420096. 2025-12-04T09:59:13.9574896Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9575480Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9576526Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9577046Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9577770Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9578348Z [rank1]:E1204 09:57:02.344000 78276 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.9578801Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9579339Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9580334Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9580842Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9581832Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9582238Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9583225Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9583707Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9584670Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9585156Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9586120Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9586563Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9587525Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9588010Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9589697Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 3. CUDA driver allocated memory was 604962816 and is now 636420096. 2025-12-04T09:59:13.9590055Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9590638Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9591611Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9591957Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9592604Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9593088Z [rank3]:E1204 09:57:02.346000 78278 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.9593176Z dist init r=3, world=4 2025-12-04T09:59:13.9593274Z dist init r=2, world=4 2025-12-04T09:59:13.9593360Z dist init r=1, world=4 2025-12-04T09:59:13.9593445Z dist init r=0, world=4 2025-12-04T09:59:13.9593539Z FAILED [8.8629s] [100%] 2025-12-04T09:59:13.9593544Z 2025-12-04T09:59:13.9593673Z =================================== FAILURES =================================== 2025-12-04T09:59:13.9593942Z ______ TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda ______ 2025-12-04T09:59:13.9594053Z Traceback (most recent call last): 2025-12-04T09:59:13.9594535Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.9594644Z self._join_processes(fn) 2025-12-04T09:59:13.9595161Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.9595282Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.9595849Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.9595947Z raise RuntimeError(error) 2025-12-04T09:59:13.9596161Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.9596265Z Traceback (most recent call last): 2025-12-04T09:59:13.9596744Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9596851Z getattr(self, test_name)() 2025-12-04T09:59:13.9597318Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9597398Z fn() 2025-12-04T09:59:13.9597853Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9597944Z method(*args, **kwargs) 2025-12-04T09:59:13.9598392Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9598482Z method(*args, **kwargs) 2025-12-04T09:59:13.9598927Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9599018Z with policy(): 2025-12-04T09:59:13.9599467Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9599570Z raise RuntimeError(msg) 2025-12-04T09:59:13.9600632Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 607059968 and is now 636420096. 2025-12-04T09:59:13.9600663Z 2025-12-04T09:59:13.9600855Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9601430Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9601435Z 2025-12-04T09:59:13.9601666Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9601670Z 2025-12-04T09:59:13.9601674Z 2025-12-04T09:59:13.9601875Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.9602129Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.9602848Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aedba904eee3ba73.xml - 2025-12-04T09:59:13.9603006Z =========================== short test summary info ============================ 2025-12-04T09:59:13.9603722Z FAILED [8.8629s] distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.9603832Z Traceback (most recent call last): 2025-12-04T09:59:13.9604314Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9604411Z getattr(self, test_name)() 2025-12-04T09:59:13.9604891Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9604971Z fn() 2025-12-04T09:59:13.9605423Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9605516Z method(*args, **kwargs) 2025-12-04T09:59:13.9605967Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9606063Z method(*args, **kwargs) 2025-12-04T09:59:13.9606531Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9606616Z with policy(): 2025-12-04T09:59:13.9607069Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9607163Z raise RuntimeError(msg) 2025-12-04T09:59:13.9608210Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 22528 on device 1. CUDA driver allocated memory was 607059968 and is now 636420096. 2025-12-04T09:59:13.9608219Z 2025-12-04T09:59:13.9608410Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9608977Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestNoGradCUDA.test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9608990Z 2025-12-04T09:59:13.9609223Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9609378Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.9609538Z ======================= 1 failed, 26 deselected in 9.08s ======================= 2025-12-04T09:59:13.9609621Z Got exit code 1 2025-12-04T09:59:13.9610120Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda 2025-12-04T09:59:13.9610486Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.9611073Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2d3d36f137cb39b5.xml 2025-12-04T09:59:13.9611251Z ============================= test session starts ============================== 2025-12-04T09:59:13.9611557Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.9611650Z cachedir: .pytest_cache 2025-12-04T09:59:13.9612107Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.9612213Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.9612309Z configfile: pytest.ini 2025-12-04T09:59:13.9612783Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.9612996Z collecting ... collected 60 items / 24 deselected / 36 selected 2025-12-04T09:59:13.9613124Z stepcurrent: skipping 24 already run items. 2025-12-04T09:59:13.9613223Z Running 3 items in this shard 2025-12-04T09:59:13.9613227Z 2025-12-04T09:59:13.9614161Z distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda I1204 09:57:09.044000 78552 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 78604 2025-12-04T09:59:13.9614610Z I1204 09:57:09.045000 78552 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 78605 2025-12-04T09:59:13.9615044Z I1204 09:57:09.046000 78552 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 78606 2025-12-04T09:59:13.9615480Z I1204 09:57:09.046000 78552 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 78607 2025-12-04T09:59:13.9616659Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9616965Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9618731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9618901Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9620147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9620275Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9622226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9622400Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9623635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9623762Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9625526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9625737Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9626969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9627101Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9628858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9629030Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9629489Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9630024Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9631037Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9631544Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9632552Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9633088Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9633957Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9634386Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9635238Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9635678Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9636524Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9636923Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9637777Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9638218Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9639724Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 0. CUDA driver allocated memory was 711917568 and is now 734986240. 2025-12-04T09:59:13.9640076Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9640662Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9641674Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9642026Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9642664Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9643158Z [rank0]:E1204 09:57:15.906000 78604 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.9643558Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9644026Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9644926Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9645385Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9646287Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9646639Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9647493Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9647922Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9648777Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9649216Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9650059Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9650457Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9651309Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9651798Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9653273Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T09:59:13.9653601Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9654182Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9655207Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9655540Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9656171Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9656901Z [rank1]:E1204 09:57:15.907000 78605 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.9657355Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9657889Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9658892Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9659446Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9660445Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9660840Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9661804Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9662293Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9663247Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9663737Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9664691Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9665140Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9666160Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9666659Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9668317Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T09:59:13.9668825Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9669451Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9670515Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9670858Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9671533Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9672047Z [rank3]:E1204 09:57:15.908000 78607 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.9672470Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9672966Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9673939Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9674415Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9675344Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9675715Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9676757Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9677193Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9678039Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9678477Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9679326Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9679779Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9680634Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9681071Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9682547Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T09:59:13.9682899Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9683491Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9684491Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9684818Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9685453Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9685938Z [rank2]:E1204 09:57:15.910000 78606 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.9686033Z dist init r=1, world=4 2025-12-04T09:59:13.9686119Z dist init r=0, world=4 2025-12-04T09:59:13.9686208Z dist init r=3, world=4 2025-12-04T09:59:13.9686290Z dist init r=2, world=4 2025-12-04T09:59:13.9686399Z FAILED [8.7592s] [ 33%] 2025-12-04T09:59:13.9686404Z 2025-12-04T09:59:13.9686541Z =================================== FAILURES =================================== 2025-12-04T09:59:13.9686814Z __ TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda ___ 2025-12-04T09:59:13.9686930Z Traceback (most recent call last): 2025-12-04T09:59:13.9687415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.9687513Z self._join_processes(fn) 2025-12-04T09:59:13.9688032Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.9688159Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.9688694Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.9688800Z raise RuntimeError(error) 2025-12-04T09:59:13.9689006Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.9689116Z Traceback (most recent call last): 2025-12-04T09:59:13.9689594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9689689Z getattr(self, test_name)() 2025-12-04T09:59:13.9690166Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9690242Z fn() 2025-12-04T09:59:13.9690689Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9690838Z method(*args, **kwargs) 2025-12-04T09:59:13.9691284Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9691381Z method(*args, **kwargs) 2025-12-04T09:59:13.9691824Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9691906Z with policy(): 2025-12-04T09:59:13.9692357Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9692449Z raise RuntimeError(msg) 2025-12-04T09:59:13.9693529Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T09:59:13.9693563Z 2025-12-04T09:59:13.9693753Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9694355Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9694362Z 2025-12-04T09:59:13.9694602Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9694607Z 2025-12-04T09:59:13.9694611Z 2025-12-04T09:59:13.9694802Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.9695034Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.9695743Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2d3d36f137cb39b5.xml - 2025-12-04T09:59:13.9695891Z =========================== short test summary info ============================ 2025-12-04T09:59:13.9696878Z FAILED [8.7592s] distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:13.9697006Z Traceback (most recent call last): 2025-12-04T09:59:13.9697595Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9697705Z getattr(self, test_name)() 2025-12-04T09:59:13.9698236Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9698327Z fn() 2025-12-04T09:59:13.9698832Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9698941Z method(*args, **kwargs) 2025-12-04T09:59:13.9699440Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9699548Z method(*args, **kwargs) 2025-12-04T09:59:13.9700052Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9700144Z with policy(): 2025-12-04T09:59:13.9700650Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9700759Z raise RuntimeError(msg) 2025-12-04T09:59:13.9701966Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T09:59:13.9701974Z 2025-12-04T09:59:13.9702190Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9702899Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9702933Z 2025-12-04T09:59:13.9703199Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9703377Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.9703549Z ======================= 1 failed, 24 deselected in 8.98s ======================= 2025-12-04T09:59:13.9703649Z Got exit code 1 2025-12-04T09:59:13.9703750Z Retrying single test... 2025-12-04T09:59:13.9704372Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-973a0dc84b27de93.xml 2025-12-04T09:59:13.9704581Z ============================= test session starts ============================== 2025-12-04T09:59:13.9704927Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.9705037Z cachedir: .pytest_cache 2025-12-04T09:59:13.9705551Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.9705671Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.9705787Z configfile: pytest.ini 2025-12-04T09:59:13.9706323Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.9706535Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.9707299Z stepcurrent: skipping 24 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9707410Z Running 1 items in this shard 2025-12-04T09:59:13.9707415Z 2025-12-04T09:59:13.9708459Z distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda I1204 09:57:22.414000 78865 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 78917 2025-12-04T09:59:13.9709165Z I1204 09:57:22.415000 78865 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 78918 2025-12-04T09:59:13.9709642Z I1204 09:57:22.416000 78865 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 78919 2025-12-04T09:59:13.9710073Z I1204 09:57:22.416000 78865 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 78920 2025-12-04T09:59:13.9711177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9711294Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9712818Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9712973Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9714068Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9714187Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9715278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9715441Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9716962Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9717107Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9718626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9718803Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9719902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9720009Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9721890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9722068Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9722533Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9723078Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9724136Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9724655Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9725646Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9726045Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9727011Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9727494Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9728460Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9728948Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9729951Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9730700Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9731670Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9732165Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9733920Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 0. CUDA driver allocated memory was 716111872 and is now 734986240. 2025-12-04T09:59:13.9734293Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9734876Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9735888Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9736205Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9737123Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9737685Z [rank0]:E1204 09:57:29.236000 78917 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.9738168Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9738706Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9739701Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9740215Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9741200Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9741593Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9742556Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9743037Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9743997Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9744502Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9745502Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9745945Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9746902Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9747429Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9749176Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 3. CUDA driver allocated memory was 611254272 and is now 625934336. 2025-12-04T09:59:13.9749504Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9750081Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9751093Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9751411Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9752047Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9752559Z [rank3]:E1204 09:57:29.237000 78920 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.9752958Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9753436Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9754329Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9754785Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9755669Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9756016Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9756879Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9757307Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9758188Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9758719Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9759569Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9759965Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9760846Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9761294Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9762766Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 1. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T09:59:13.9763097Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9763677Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9764686Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9765008Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9765669Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9766161Z [rank1]:E1204 09:57:29.240000 78918 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.9766559Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9767033Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9767917Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9768374Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9769248Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9769597Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9770454Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9770935Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9771786Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9772218Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9773063Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9773482Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9774335Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9774776Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9776244Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T09:59:13.9776655Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9777481Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9778658Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9779020Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9779736Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9785700Z [rank2]:E1204 09:57:29.241000 78919 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.9785853Z dist init r=3, world=4 2025-12-04T09:59:13.9785951Z dist init r=1, world=4 2025-12-04T09:59:13.9786052Z dist init r=2, world=4 2025-12-04T09:59:13.9786150Z dist init r=0, world=4 2025-12-04T09:59:13.9786247Z FAILED [8.6879s] [100%] 2025-12-04T09:59:13.9786255Z 2025-12-04T09:59:13.9786414Z =================================== FAILURES =================================== 2025-12-04T09:59:13.9786731Z __ TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda ___ 2025-12-04T09:59:13.9786850Z Traceback (most recent call last): 2025-12-04T09:59:13.9787415Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.9787524Z self._join_processes(fn) 2025-12-04T09:59:13.9788115Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.9788256Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.9788859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.9789159Z raise RuntimeError(error) 2025-12-04T09:59:13.9789399Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.9789504Z Traceback (most recent call last): 2025-12-04T09:59:13.9789997Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9790095Z getattr(self, test_name)() 2025-12-04T09:59:13.9790575Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9790651Z fn() 2025-12-04T09:59:13.9791100Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9791226Z method(*args, **kwargs) 2025-12-04T09:59:13.9791672Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9791763Z method(*args, **kwargs) 2025-12-04T09:59:13.9792209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9792299Z with policy(): 2025-12-04T09:59:13.9792758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9792849Z raise RuntimeError(msg) 2025-12-04T09:59:13.9793918Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 3. CUDA driver allocated memory was 611254272 and is now 625934336. 2025-12-04T09:59:13.9793927Z 2025-12-04T09:59:13.9794122Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9794726Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9794735Z 2025-12-04T09:59:13.9794974Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9794979Z 2025-12-04T09:59:13.9794983Z 2025-12-04T09:59:13.9795209Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.9795446Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.9796151Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-973a0dc84b27de93.xml - 2025-12-04T09:59:13.9796300Z =========================== short test summary info ============================ 2025-12-04T09:59:13.9797056Z FAILED [8.6879s] distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:13.9797165Z Traceback (most recent call last): 2025-12-04T09:59:13.9797659Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9797757Z getattr(self, test_name)() 2025-12-04T09:59:13.9798233Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9798315Z fn() 2025-12-04T09:59:13.9798761Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9798850Z method(*args, **kwargs) 2025-12-04T09:59:13.9799298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9799392Z method(*args, **kwargs) 2025-12-04T09:59:13.9799840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9799953Z with policy(): 2025-12-04T09:59:13.9800427Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9800528Z raise RuntimeError(msg) 2025-12-04T09:59:13.9801600Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 3. CUDA driver allocated memory was 611254272 and is now 625934336. 2025-12-04T09:59:13.9801605Z 2025-12-04T09:59:13.9801800Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9802407Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9802441Z 2025-12-04T09:59:13.9802675Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9802842Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.9802995Z ======================= 1 failed, 26 deselected in 8.90s ======================= 2025-12-04T09:59:13.9803084Z Got exit code 1 2025-12-04T09:59:13.9803180Z Retrying single test... 2025-12-04T09:59:13.9803730Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e9342b39aaf3792.xml 2025-12-04T09:59:13.9803876Z ============================= test session starts ============================== 2025-12-04T09:59:13.9804180Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.9804273Z cachedir: .pytest_cache 2025-12-04T09:59:13.9804732Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.9804837Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.9804935Z configfile: pytest.ini 2025-12-04T09:59:13.9805408Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.9805597Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:13.9806323Z stepcurrent: skipping 24 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9806422Z Running 1 items in this shard 2025-12-04T09:59:13.9806427Z 2025-12-04T09:59:13.9807361Z distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda I1204 09:57:35.884000 79178 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 79230 2025-12-04T09:59:13.9807804Z I1204 09:57:35.885000 79178 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 79231 2025-12-04T09:59:13.9808242Z I1204 09:57:35.886000 79178 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 79232 2025-12-04T09:59:13.9808677Z I1204 09:57:35.886000 79178 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 79233 2025-12-04T09:59:13.9809782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9809904Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9811432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9811644Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9812738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9812846Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9813951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9814087Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9815619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9815768Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9817676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9817846Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9819077Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9819211Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9821211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9821393Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9821857Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9822401Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9823406Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9823913Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9824910Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9825312Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9826316Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9826845Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9827814Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9828298Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9829250Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9829745Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9830712Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9831205Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9832992Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 0. CUDA driver allocated memory was 707723264 and is now 734986240. 2025-12-04T09:59:13.9833349Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9833973Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9835065Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9835418Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9836090Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9836605Z [rank0]:E1204 09:57:42.752000 79230 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.9837030Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9837541Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9838675Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9839166Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9840135Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9840517Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9841621Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9842080Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9842991Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9843471Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9844373Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9844798Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9845706Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9846172Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9847740Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 2. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T09:59:13.9848093Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9848737Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9849905Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9850233Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9850866Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9851357Z [rank2]:E1204 09:57:42.754000 79232 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.9851752Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9852226Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9853111Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9853561Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9854473Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9854846Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9855705Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9856133Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9857281Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9857822Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9858783Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9859233Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9860195Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9860694Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9862361Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T09:59:13.9862761Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9863422Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9864549Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9864916Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9865635Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9866184Z [rank1]:E1204 09:57:42.755000 79231 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.9866631Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9867168Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9868168Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9868824Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9869827Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9870208Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9871134Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9871637Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9872565Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9873035Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9873962Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9874396Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9875325Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9875800Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9877452Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T09:59:13.9877804Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9878443Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9879548Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9879902Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9880695Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9881386Z [rank3]:E1204 09:57:42.757000 79233 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.9881491Z dist init r=2, world=4 2025-12-04T09:59:13.9881583Z dist init r=1, world=4 2025-12-04T09:59:13.9881680Z dist init r=0, world=4 2025-12-04T09:59:13.9881773Z dist init r=3, world=4 2025-12-04T09:59:13.9881861Z FAILED [8.7156s] [100%] 2025-12-04T09:59:13.9881867Z 2025-12-04T09:59:13.9882014Z =================================== FAILURES =================================== 2025-12-04T09:59:13.9882370Z __ TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda ___ 2025-12-04T09:59:13.9882483Z Traceback (most recent call last): 2025-12-04T09:59:13.9883017Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.9883126Z self._join_processes(fn) 2025-12-04T09:59:13.9883704Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.9883842Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.9884429Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.9884576Z raise RuntimeError(error) 2025-12-04T09:59:13.9884806Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.9884918Z Traceback (most recent call last): 2025-12-04T09:59:13.9885451Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9885555Z getattr(self, test_name)() 2025-12-04T09:59:13.9886080Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9886164Z fn() 2025-12-04T09:59:13.9886656Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9886761Z method(*args, **kwargs) 2025-12-04T09:59:13.9887246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9887346Z method(*args, **kwargs) 2025-12-04T09:59:13.9887836Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9887925Z with policy(): 2025-12-04T09:59:13.9888433Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9888534Z raise RuntimeError(msg) 2025-12-04T09:59:13.9889736Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 0. CUDA driver allocated memory was 707723264 and is now 734986240. 2025-12-04T09:59:13.9889747Z 2025-12-04T09:59:13.9889953Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9890607Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9890615Z 2025-12-04T09:59:13.9890880Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9890885Z 2025-12-04T09:59:13.9890892Z 2025-12-04T09:59:13.9891107Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.9891365Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.9892145Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e9342b39aaf3792.xml - 2025-12-04T09:59:13.9892307Z =========================== short test summary info ============================ 2025-12-04T09:59:13.9893124Z FAILED [8.7156s] distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:13.9893240Z Traceback (most recent call last): 2025-12-04T09:59:13.9893779Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9893887Z getattr(self, test_name)() 2025-12-04T09:59:13.9894462Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9894553Z fn() 2025-12-04T09:59:13.9895043Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9895140Z method(*args, **kwargs) 2025-12-04T09:59:13.9895630Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9895727Z method(*args, **kwargs) 2025-12-04T09:59:13.9896217Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9896412Z with policy(): 2025-12-04T09:59:13.9897089Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9897207Z raise RuntimeError(msg) 2025-12-04T09:59:13.9898422Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda! Caching allocator allocated memory was 512 and is now reported as 27136 on device 0. CUDA driver allocated memory was 707723264 and is now 734986240. 2025-12-04T09:59:13.9898428Z 2025-12-04T09:59:13.9898651Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9899332Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9899338Z 2025-12-04T09:59:13.9899599Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9899787Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:13.9899960Z ======================= 1 failed, 26 deselected in 8.93s ======================= 2025-12-04T09:59:13.9900062Z Got exit code 1 2025-12-04T09:59:13.9900666Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda 2025-12-04T09:59:13.9901109Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:13.9901745Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-15b775a41cf5a439.xml 2025-12-04T09:59:13.9901904Z ============================= test session starts ============================== 2025-12-04T09:59:13.9902255Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:13.9902362Z cachedir: .pytest_cache 2025-12-04T09:59:13.9902871Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:13.9902997Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:13.9903103Z configfile: pytest.ini 2025-12-04T09:59:13.9903636Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:13.9903858Z collecting ... collected 60 items / 25 deselected / 35 selected 2025-12-04T09:59:13.9903999Z stepcurrent: skipping 25 already run items. 2025-12-04T09:59:13.9904110Z Running 2 items in this shard 2025-12-04T09:59:13.9904121Z 2025-12-04T09:59:13.9905181Z distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda I1204 09:57:49.294000 79491 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 79543 2025-12-04T09:59:13.9905677Z I1204 09:57:49.295000 79491 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 79544 2025-12-04T09:59:13.9906174Z I1204 09:57:49.295000 79491 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 79545 2025-12-04T09:59:13.9906731Z I1204 09:57:49.296000 79491 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 79546 2025-12-04T09:59:13.9907991Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9908117Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9909288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.9909480Z {} 2025-12-04T09:59:13.9909760Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.9909961Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.9911490Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9911642Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9912738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9912850Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9913737Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.9913890Z {} 2025-12-04T09:59:13.9914201Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.9914392Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.9915907Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9916062Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9917159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9917281Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9918159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.9918308Z {} 2025-12-04T09:59:13.9918591Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.9918778Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.9920327Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9920497Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9922060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:13.9922186Z self.encoder = TransformerEncoder( 2025-12-04T09:59:13.9923244Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:13.9923421Z {} 2025-12-04T09:59:13.9923739Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:13.9923954Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:13.9925669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:13.9925831Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:13.9926295Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9926827Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9927839Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9928380Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9929372Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9929769Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9930731Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9931224Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9932180Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9932668Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9933825Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9934253Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9935236Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9935709Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9937557Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 0. CUDA driver allocated memory was 711917568 and is now 734986240. 2025-12-04T09:59:13.9937965Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9938630Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9939761Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:13.9940126Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9940845Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9941390Z [rank0]:E1204 09:57:56.108000 79543 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:13.9941847Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9942374Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9943412Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9943915Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9944911Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9945305Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9946264Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9946756Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9947717Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9948208Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9949393Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9949849Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9950751Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9951208Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9952800Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 2. CUDA driver allocated memory was 579796992 and is now 625934336. 2025-12-04T09:59:13.9953160Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9953750Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9954747Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:13.9955072Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9955705Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9956191Z [rank2]:E1204 09:57:56.110000 79545 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:13.9956593Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9957085Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9957975Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9958428Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9959309Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9959665Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9960520Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9960950Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9961809Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9962276Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9963163Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9963554Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9964406Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9964864Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9966338Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 3. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T09:59:13.9966660Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9967242Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9968243Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:13.9968563Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9969205Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9969769Z [rank3]:E1204 09:57:56.110000 79546 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:13.9970165Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:13.9970637Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:13.9971524Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9971982Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:13.9972858Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9973217Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:13.9974074Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9974504Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9975380Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9975834Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:13.9976955Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9977404Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:13.9978368Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9978895Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:13.9980552Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 1. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T09:59:13.9980920Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9981573Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9982709Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:13.9983071Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:13.9983816Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9984360Z [rank1]:E1204 09:57:56.114000 79544 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:13.9984457Z dist init r=0, world=4 2025-12-04T09:59:13.9984558Z dist init r=2, world=4 2025-12-04T09:59:13.9984651Z dist init r=3, world=4 2025-12-04T09:59:13.9984745Z dist init r=1, world=4 2025-12-04T09:59:13.9984845Z FAILED [8.5663s] [ 50%] 2025-12-04T09:59:13.9984851Z 2025-12-04T09:59:13.9985000Z =================================== FAILURES =================================== 2025-12-04T09:59:13.9985314Z ___ TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda ___ 2025-12-04T09:59:13.9985430Z Traceback (most recent call last): 2025-12-04T09:59:13.9985978Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:13.9986093Z self._join_processes(fn) 2025-12-04T09:59:13.9986678Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:13.9986820Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:13.9987421Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:13.9987532Z raise RuntimeError(error) 2025-12-04T09:59:13.9987767Z RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.9987882Z Traceback (most recent call last): 2025-12-04T09:59:13.9988474Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9988591Z getattr(self, test_name)() 2025-12-04T09:59:13.9989213Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9989296Z fn() 2025-12-04T09:59:13.9989745Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9989835Z method(*args, **kwargs) 2025-12-04T09:59:13.9990283Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9990401Z method(*args, **kwargs) 2025-12-04T09:59:13.9990846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9990934Z with policy(): 2025-12-04T09:59:13.9991552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9991661Z raise RuntimeError(msg) 2025-12-04T09:59:13.9992791Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 2. CUDA driver allocated memory was 579796992 and is now 625934336. 2025-12-04T09:59:13.9992797Z 2025-12-04T09:59:13.9992996Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:13.9993634Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:13.9993641Z 2025-12-04T09:59:13.9993888Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:13.9993893Z 2025-12-04T09:59:13.9993900Z 2025-12-04T09:59:13.9994108Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:13.9994352Z Process 2 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:13.9995135Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-15b775a41cf5a439.xml - 2025-12-04T09:59:13.9995294Z =========================== short test summary info ============================ 2025-12-04T09:59:13.9996078Z FAILED [8.5663s] distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda - RuntimeError: Process 2 exited with error code 10 and exception: 2025-12-04T09:59:13.9996195Z Traceback (most recent call last): 2025-12-04T09:59:13.9996706Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:13.9996812Z getattr(self, test_name)() 2025-12-04T09:59:13.9997317Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:13.9997397Z fn() 2025-12-04T09:59:13.9997876Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9997971Z method(*args, **kwargs) 2025-12-04T09:59:13.9998441Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:13.9998537Z method(*args, **kwargs) 2025-12-04T09:59:13.9999007Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:13.9999100Z with policy(): 2025-12-04T09:59:13.9999574Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:13.9999698Z raise RuntimeError(msg) 2025-12-04T09:59:14.0000860Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 2. CUDA driver allocated memory was 579796992 and is now 625934336. 2025-12-04T09:59:14.0000867Z 2025-12-04T09:59:14.0001068Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0001883Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0001890Z 2025-12-04T09:59:14.0002144Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0002349Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:14.0002520Z ======================= 1 failed, 25 deselected in 8.78s ======================= 2025-12-04T09:59:14.0002728Z Got exit code 1 2025-12-04T09:59:14.0002832Z Retrying single test... 2025-12-04T09:59:14.0003415Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-56374ffd8bd068de.xml 2025-12-04T09:59:14.0003568Z ============================= test session starts ============================== 2025-12-04T09:59:14.0003896Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:14.0003996Z cachedir: .pytest_cache 2025-12-04T09:59:14.0004482Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:14.0004600Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:14.0004697Z configfile: pytest.ini 2025-12-04T09:59:14.0005206Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:14.0005406Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:14.0006120Z stepcurrent: skipping 25 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0006236Z Running 1 items in this shard 2025-12-04T09:59:14.0006288Z 2025-12-04T09:59:14.0007264Z distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda I1204 09:58:02.594000 79804 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 79856 2025-12-04T09:59:14.0007739Z I1204 09:58:02.595000 79804 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 79857 2025-12-04T09:59:14.0008203Z I1204 09:58:02.596000 79804 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 79858 2025-12-04T09:59:14.0008661Z I1204 09:58:02.596000 79804 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 79859 2025-12-04T09:59:14.0009840Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:14.0009957Z self.encoder = TransformerEncoder( 2025-12-04T09:59:14.0010892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:14.0011054Z {} 2025-12-04T09:59:14.0011354Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:14.0011560Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:14.0013509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0013700Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0014857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:14.0015005Z self.encoder = TransformerEncoder( 2025-12-04T09:59:14.0015936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:14.0016099Z {} 2025-12-04T09:59:14.0016481Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:14.0016690Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:14.0018577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0018744Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0019982Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:14.0020112Z self.encoder = TransformerEncoder( 2025-12-04T09:59:14.0021617Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:14.0021749Z self.encoder = TransformerEncoder( 2025-12-04T09:59:14.0022735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:14.0022916Z {} 2025-12-04T09:59:14.0023235Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:14.0023450Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:14.0025160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0025322Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0026315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:14.0026485Z {} 2025-12-04T09:59:14.0026798Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:14.0027089Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:14.0028799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0028966Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0029426Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0030001Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0031010Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0031516Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0032508Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0033011Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0034063Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0034634Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0035507Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0035942Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0036785Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0037182Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0038035Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0038476Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0039949Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 1. CUDA driver allocated memory was 611254272 and is now 625934336. 2025-12-04T09:59:14.0040274Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0040857Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0041907Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0042231Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0042861Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0043348Z [rank1]:E1204 09:58:09.452000 79857 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:14.0043773Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0044239Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0045130Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0045575Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0046457Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0046805Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0047659Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0048117Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0048968Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0049404Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0050251Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0050649Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0051506Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0051939Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0053405Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 0. CUDA driver allocated memory was 716111872 and is now 734986240. 2025-12-04T09:59:14.0053732Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0054368Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0055365Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0055688Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0056383Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0057106Z [rank0]:E1204 09:58:09.452000 79856 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:14.0057563Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0058092Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0059093Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0059596Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0060588Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0060984Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0061985Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0062468Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0063427Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0063917Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0064873Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0065325Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0066282Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0066773Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0068455Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 2. CUDA driver allocated memory was 609157120 and is now 625934336. 2025-12-04T09:59:14.0068848Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0069566Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0070571Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0070926Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0071558Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0072045Z [rank2]:E1204 09:58:09.453000 79858 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:14.0072443Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0072909Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0073792Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0074242Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0075120Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0075493Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0076345Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0076775Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0077620Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0078060Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0078912Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0079310Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0080158Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0080595Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0082374Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 3. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T09:59:14.0082702Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0083282Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0084281Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0084641Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0085271Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0085760Z [rank3]:E1204 09:58:09.456000 79859 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:14.0085849Z dist init r=1, world=4 2025-12-04T09:59:14.0085933Z dist init r=3, world=4 2025-12-04T09:59:14.0086019Z dist init r=2, world=4 2025-12-04T09:59:14.0086100Z dist init r=0, world=4 2025-12-04T09:59:14.0086183Z FAILED [8.7289s] [100%] 2025-12-04T09:59:14.0086191Z 2025-12-04T09:59:14.0086326Z =================================== FAILURES =================================== 2025-12-04T09:59:14.0086594Z ___ TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda ___ 2025-12-04T09:59:14.0086705Z Traceback (most recent call last): 2025-12-04T09:59:14.0087195Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:14.0087289Z self._join_processes(fn) 2025-12-04T09:59:14.0087833Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:14.0087955Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:14.0088486Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:14.0088587Z raise RuntimeError(error) 2025-12-04T09:59:14.0088789Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:14.0088899Z Traceback (most recent call last): 2025-12-04T09:59:14.0089375Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0089472Z getattr(self, test_name)() 2025-12-04T09:59:14.0089947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0090024Z fn() 2025-12-04T09:59:14.0090476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0090569Z method(*args, **kwargs) 2025-12-04T09:59:14.0091013Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0091105Z method(*args, **kwargs) 2025-12-04T09:59:14.0091548Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0091632Z with policy(): 2025-12-04T09:59:14.0092082Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0092225Z raise RuntimeError(msg) 2025-12-04T09:59:14.0093300Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 1. CUDA driver allocated memory was 611254272 and is now 625934336. 2025-12-04T09:59:14.0093305Z 2025-12-04T09:59:14.0093493Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0094089Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0094098Z 2025-12-04T09:59:14.0094360Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0094365Z 2025-12-04T09:59:14.0094369Z 2025-12-04T09:59:14.0094563Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:14.0094800Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:14.0095501Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-56374ffd8bd068de.xml - 2025-12-04T09:59:14.0095656Z =========================== short test summary info ============================ 2025-12-04T09:59:14.0096481Z FAILED [8.7289s] distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:14.0096592Z Traceback (most recent call last): 2025-12-04T09:59:14.0097310Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0097424Z getattr(self, test_name)() 2025-12-04T09:59:14.0097959Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0098051Z fn() 2025-12-04T09:59:14.0098557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0098667Z method(*args, **kwargs) 2025-12-04T09:59:14.0099209Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0099309Z method(*args, **kwargs) 2025-12-04T09:59:14.0099813Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0099906Z with policy(): 2025-12-04T09:59:14.0100410Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0100524Z raise RuntimeError(msg) 2025-12-04T09:59:14.0101729Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 1. CUDA driver allocated memory was 611254272 and is now 625934336. 2025-12-04T09:59:14.0101737Z 2025-12-04T09:59:14.0101956Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0102631Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0102636Z 2025-12-04T09:59:14.0102900Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0103074Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:14.0103246Z ======================= 1 failed, 26 deselected in 8.95s ======================= 2025-12-04T09:59:14.0103344Z Got exit code 1 2025-12-04T09:59:14.0103445Z Retrying single test... 2025-12-04T09:59:14.0104094Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6288913bb010f746.xml 2025-12-04T09:59:14.0104304Z ============================= test session starts ============================== 2025-12-04T09:59:14.0104651Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:14.0104760Z cachedir: .pytest_cache 2025-12-04T09:59:14.0105274Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:14.0105392Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:14.0105499Z configfile: pytest.ini 2025-12-04T09:59:14.0106031Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:14.0106272Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:14.0107041Z stepcurrent: skipping 25 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0107154Z Running 1 items in this shard 2025-12-04T09:59:14.0107160Z 2025-12-04T09:59:14.0108213Z distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda I1204 09:58:16.043000 80117 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 80169 2025-12-04T09:59:14.0108818Z I1204 09:58:16.044000 80117 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 80170 2025-12-04T09:59:14.0109391Z I1204 09:58:16.045000 80117 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 80171 2025-12-04T09:59:14.0109829Z I1204 09:58:16.046000 80117 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 80172 2025-12-04T09:59:14.0110930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:14.0111046Z self.encoder = TransformerEncoder( 2025-12-04T09:59:14.0111957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:14.0112114Z {} 2025-12-04T09:59:14.0112392Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:14.0112580Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:14.0114102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0114248Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0115351Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:14.0115460Z self.encoder = TransformerEncoder( 2025-12-04T09:59:14.0116353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:14.0116508Z {} 2025-12-04T09:59:14.0116840Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:14.0117030Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:14.0118548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0118698Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0119792Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:14.0119932Z self.encoder = TransformerEncoder( 2025-12-04T09:59:14.0121352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/modules/transformer.py:144: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.self_attn.batch_first was not True(use batch_first for better inference performance) 2025-12-04T09:59:14.0121482Z self.encoder = TransformerEncoder( 2025-12-04T09:59:14.0122480Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:14.0122652Z {} 2025-12-04T09:59:14.0122974Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:14.0123190Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:14.0124980Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0125154Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0126140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:64: UserWarning: Both mixed precision and an auto_wrap_policy were specified to FSDP, where the wrapped module has submodules of type: 2025-12-04T09:59:14.0126316Z {} 2025-12-04T09:59:14.0126629Z These modules will be wrapped as separate FSDP instacnes with mixed precision disabled. 2025-12-04T09:59:14.0126838Z _warn_on_overridden_mixed_precision(overridden_module_classes) 2025-12-04T09:59:14.0128570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0128731Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0129194Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0129724Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0130724Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0131309Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0132297Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0132695Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0133754Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0134230Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0135081Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0135509Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0136420Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0137006Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0137986Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0138477Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0140165Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 0. CUDA driver allocated memory was 711917568 and is now 734986240. 2025-12-04T09:59:14.0140526Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0141187Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0142318Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0142680Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0143394Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0143935Z [rank0]:E1204 09:58:22.944000 80169 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:14.0144389Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0144921Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0145971Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0146484Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0147468Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0147865Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0149066Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0149538Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0150436Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0150890Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0151794Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0152214Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0153125Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0153616Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0155181Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 3. CUDA driver allocated memory was 407830528 and is now 625934336. 2025-12-04T09:59:14.0155524Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0156144Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0157210Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0157545Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0158223Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0158733Z [rank3]:E1204 09:58:22.945000 80172 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:14.0159157Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0159728Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0160674Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0161154Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0162080Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0162479Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0163387Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0163848Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0164848Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0165277Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0166135Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0166526Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0167592Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0168058Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0169617Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 2. CUDA driver allocated memory was 604962816 and is now 625934336. 2025-12-04T09:59:14.0169965Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0170583Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0171646Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0171983Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0172660Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0173221Z [rank2]:E1204 09:58:22.947000 80171 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:14.0173645Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0174147Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0175090Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0175744Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0177050Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0177461Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0178425Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0178911Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0179867Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0180351Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0181315Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0181793Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0182764Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0183248Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0184913Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 1. CUDA driver allocated memory was 607059968 and is now 625934336. 2025-12-04T09:59:14.0185278Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0185937Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0187065Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0187428Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0188203Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0188747Z [rank1]:E1204 09:58:22.948000 80170 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:14.0188853Z dist init r=0, world=4 2025-12-04T09:59:14.0188947Z dist init r=1, world=4 2025-12-04T09:59:14.0189151Z dist init r=2, world=4 2025-12-04T09:59:14.0189252Z dist init r=3, world=4 2025-12-04T09:59:14.0189342Z FAILED [8.5884s] [100%] 2025-12-04T09:59:14.0189348Z 2025-12-04T09:59:14.0189490Z =================================== FAILURES =================================== 2025-12-04T09:59:14.0189819Z ___ TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda ___ 2025-12-04T09:59:14.0189933Z Traceback (most recent call last): 2025-12-04T09:59:14.0190666Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:14.0190769Z self._join_processes(fn) 2025-12-04T09:59:14.0191289Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:14.0191419Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:14.0191952Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:14.0192049Z raise RuntimeError(error) 2025-12-04T09:59:14.0192257Z RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:14.0192362Z Traceback (most recent call last): 2025-12-04T09:59:14.0192846Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0192940Z getattr(self, test_name)() 2025-12-04T09:59:14.0193413Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0193498Z fn() 2025-12-04T09:59:14.0194128Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0194255Z method(*args, **kwargs) 2025-12-04T09:59:14.0194730Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0194825Z method(*args, **kwargs) 2025-12-04T09:59:14.0195298Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0195385Z with policy(): 2025-12-04T09:59:14.0195863Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0195965Z raise RuntimeError(msg) 2025-12-04T09:59:14.0197095Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 3. CUDA driver allocated memory was 407830528 and is now 625934336. 2025-12-04T09:59:14.0197103Z 2025-12-04T09:59:14.0197311Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0197946Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0197951Z 2025-12-04T09:59:14.0198197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0198202Z 2025-12-04T09:59:14.0198214Z 2025-12-04T09:59:14.0198417Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:14.0198660Z Process 3 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:14.0199468Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6288913bb010f746.xml - 2025-12-04T09:59:14.0199626Z =========================== short test summary info ============================ 2025-12-04T09:59:14.0200409Z FAILED [8.5884s] distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda - RuntimeError: Process 3 exited with error code 10 and exception: 2025-12-04T09:59:14.0200527Z Traceback (most recent call last): 2025-12-04T09:59:14.0201045Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0201155Z getattr(self, test_name)() 2025-12-04T09:59:14.0201685Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0201766Z fn() 2025-12-04T09:59:14.0202246Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0202345Z method(*args, **kwargs) 2025-12-04T09:59:14.0202823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0202918Z method(*args, **kwargs) 2025-12-04T09:59:14.0203390Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0203484Z with policy(): 2025-12-04T09:59:14.0203960Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0204060Z raise RuntimeError(msg) 2025-12-04T09:59:14.0205207Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda! Caching allocator allocated memory was 512 and is now reported as 28160 on device 3. CUDA driver allocated memory was 407830528 and is now 625934336. 2025-12-04T09:59:14.0205216Z 2025-12-04T09:59:14.0205415Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0206170Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestParamInitCUDA.test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0206177Z 2025-12-04T09:59:14.0206408Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0206570Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:14.0206724Z ======================= 1 failed, 26 deselected in 8.80s ======================= 2025-12-04T09:59:14.0206808Z Got exit code 1 2025-12-04T09:59:14.0207342Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda 2025-12-04T09:59:14.0207699Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:14.0208250Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d2350a2a3a63f23.xml 2025-12-04T09:59:14.0208400Z ============================= test session starts ============================== 2025-12-04T09:59:14.0208707Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:14.0208804Z cachedir: .pytest_cache 2025-12-04T09:59:14.0209254Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:14.0209361Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:14.0209458Z configfile: pytest.ini 2025-12-04T09:59:14.0209933Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:14.0210118Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:14.0210297Z stepcurrent: skipping 26 already run items. 2025-12-04T09:59:14.0210396Z Running 1 items in this shard 2025-12-04T09:59:14.0210401Z 2025-12-04T09:59:14.0211251Z distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda I1204 09:58:29.433000 80430 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 80482 2025-12-04T09:59:14.0211692Z I1204 09:58:29.434000 80430 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 80483 2025-12-04T09:59:14.0212124Z I1204 09:58:29.435000 80430 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 80484 2025-12-04T09:59:14.0212586Z I1204 09:58:29.436000 80430 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 80485 2025-12-04T09:59:14.0214122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0214275Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0215787Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0215938Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0217795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0218006Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0219715Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0219885Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0221110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:14.0221233Z return func(*args, **kwargs) 2025-12-04T09:59:14.0221702Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0222240Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0223248Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0223763Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0225496Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0227082Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0228573Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0230157Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0231750Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0233507Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0235002Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0236446Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0237891Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0239383Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0241457Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064. 2025-12-04T09:59:14.0243425Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0244521Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0246220Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0247647Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0248793Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0250101Z [rank0]:E1204 09:58:37.044000 80482 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:14.0251161Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0252219Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0254000Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0255618Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0257498Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0259012Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0260508Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0262134Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0263727Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0265310Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0266892Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0268433Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0270052Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0271591Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0273744Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:14.0275722Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0276927Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0278694Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0280044Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0281121Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0282346Z [rank1]:E1204 09:58:37.045000 80483 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:14.0283339Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0284327Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0285835Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0287301Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0288749Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0290094Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0291413Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0292865Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0294262Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0295664Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0297367Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0298905Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0300461Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0302047Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0304271Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 607059968 and is now 651100160. 2025-12-04T09:59:14.0306335Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0307496Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0309372Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0310716Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0311789Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0313023Z [rank3]:E1204 09:58:37.046000 80485 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:14.0314032Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0315018Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0316550Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0318005Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0319460Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0320938Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0322632Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0324222Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0325804Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0327382Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0328964Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0330495Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0332043Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0333785Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0335739Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:14.0337910Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0339057Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0340872Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0342382Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0343593Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0344989Z [rank2]:E1204 09:58:37.046000 80484 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:14.0345767Z dist init r=3, world=4 2025-12-04T09:59:14.0346036Z dist init r=2, world=4 2025-12-04T09:59:14.0346352Z dist init r=1, world=4 2025-12-04T09:59:14.0346648Z dist init r=0, world=4 2025-12-04T09:59:14.0347978Z [rank0]:[W1204 09:58:37.063140663 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:14.0349418Z FAILED [9.6294s] [100%] 2025-12-04T09:59:14.0349571Z 2025-12-04T09:59:14.0349705Z =================================== FAILURES =================================== 2025-12-04T09:59:14.0350208Z _____________ TestAutogradCUDA.test_unshard_params_as_tensors_cuda _____________ 2025-12-04T09:59:14.0350670Z Traceback (most recent call last): 2025-12-04T09:59:14.0351397Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:14.0352092Z self._join_processes(fn) 2025-12-04T09:59:14.0352793Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:14.0353549Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:14.0354323Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:14.0355089Z raise RuntimeError(error) 2025-12-04T09:59:14.0355472Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:14.0355907Z Traceback (most recent call last): 2025-12-04T09:59:14.0356594Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0357297Z getattr(self, test_name)() 2025-12-04T09:59:14.0357947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0358628Z fn() 2025-12-04T09:59:14.0359198Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0359857Z method(*args, **kwargs) 2025-12-04T09:59:14.0360512Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0361180Z method(*args, **kwargs) 2025-12-04T09:59:14.0361798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0362444Z with policy(): 2025-12-04T09:59:14.0363046Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0363716Z raise RuntimeError(msg) 2025-12-04T09:59:14.0364868Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:14.0365974Z 2025-12-04T09:59:14.0366164Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0366976Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0367590Z 2025-12-04T09:59:14.0367830Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0368183Z 2025-12-04T09:59:14.0368188Z 2025-12-04T09:59:14.0368388Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:14.0368923Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:14.0369980Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d2350a2a3a63f23.xml - 2025-12-04T09:59:14.0370953Z =========================== short test summary info ============================ 2025-12-04T09:59:14.0371950Z FAILED [9.6294s] distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:14.0372823Z Traceback (most recent call last): 2025-12-04T09:59:14.0373513Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0374211Z getattr(self, test_name)() 2025-12-04T09:59:14.0374872Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0375544Z fn() 2025-12-04T09:59:14.0376134Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0377083Z method(*args, **kwargs) 2025-12-04T09:59:14.0377785Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0378526Z method(*args, **kwargs) 2025-12-04T09:59:14.0379225Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0379964Z with policy(): 2025-12-04T09:59:14.0380629Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0381379Z raise RuntimeError(msg) 2025-12-04T09:59:14.0382684Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:14.0383926Z 2025-12-04T09:59:14.0384144Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0385049Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0385751Z 2025-12-04T09:59:14.0386016Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0386634Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:14.0387125Z ======================= 1 failed, 26 deselected in 9.85s ======================= 2025-12-04T09:59:14.0387530Z Got exit code 1 2025-12-04T09:59:14.0387787Z Retrying single test... 2025-12-04T09:59:14.0388586Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ee9779088060e0f5.xml 2025-12-04T09:59:14.0389541Z ============================= test session starts ============================== 2025-12-04T09:59:14.0390116Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:14.0390633Z cachedir: .pytest_cache 2025-12-04T09:59:14.0391252Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:14.0391928Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:14.0392229Z configfile: pytest.ini 2025-12-04T09:59:14.0392860Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:14.0393635Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:14.0394522Z stepcurrent: skipping 26 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0395309Z Running 1 items in this shard 2025-12-04T09:59:14.0395493Z 2025-12-04T09:59:14.0396339Z distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda I1204 09:58:43.893000 80767 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 80819 2025-12-04T09:59:14.0397796Z I1204 09:58:43.894000 80767 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 80820 2025-12-04T09:59:14.0398798Z I1204 09:58:43.895000 80767 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 80821 2025-12-04T09:59:14.0399791Z I1204 09:58:43.896000 80767 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 80822 2025-12-04T09:59:14.0401881Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0403689Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0405461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0407230Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0409001Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0410763Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0412559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0414332Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0415461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:14.0416649Z return func(*args, **kwargs) 2025-12-04T09:59:14.0417486Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0418616Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0420276Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0422134Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0423775Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0425292Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0426858Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0428478Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0430067Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0431641Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0433284Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0434665Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0436035Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0437438Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0439388Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T09:59:14.0441216Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0442244Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0443894Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0445242Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0446322Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0447561Z [rank0]:E1204 09:58:51.528000 80819 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:14.0448561Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0449549Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0451030Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0452489Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0453936Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0455276Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0456899Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0458491Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0460080Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0461698Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0463279Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0464824Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0466372Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0467961Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0470232Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:14.0472062Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0473132Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0474747Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0476091Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0477167Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0478402Z [rank2]:E1204 09:58:51.529000 80821 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:14.0479404Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0480396Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0481873Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0483315Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0484785Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0486153Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0487485Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0488881Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0490279Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0491705Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0493359Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0494811Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0496256Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0498055Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0500247Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T09:59:14.0502329Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0503493Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0505304Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0506806Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0508019Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0509570Z [rank1]:E1204 09:58:51.530000 80820 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:14.0510574Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0511566Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0513047Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0514503Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0516022Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0517379Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0518692Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0520095Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0521916Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0523514Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0525107Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0526639Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0528191Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0529791Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0532051Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T09:59:14.0534175Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0535195Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0537064Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0538591Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0539816Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0541207Z [rank3]:E1204 09:58:51.531000 80822 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:14.0541987Z dist init r=1, world=4 2025-12-04T09:59:14.0542257Z dist init r=2, world=4 2025-12-04T09:59:14.0542529Z dist init r=0, world=4 2025-12-04T09:59:14.0542783Z dist init r=3, world=4 2025-12-04T09:59:14.0544107Z [rank0]:[W1204 09:58:51.540419531 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:14.0545545Z FAILED [9.7900s] [100%] 2025-12-04T09:59:14.0545715Z 2025-12-04T09:59:14.0545906Z =================================== FAILURES =================================== 2025-12-04T09:59:14.0546464Z _____________ TestAutogradCUDA.test_unshard_params_as_tensors_cuda _____________ 2025-12-04T09:59:14.0546988Z Traceback (most recent call last): 2025-12-04T09:59:14.0547768Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:14.0548544Z self._join_processes(fn) 2025-12-04T09:59:14.0549396Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:14.0550196Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:14.0550977Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:14.0551729Z raise RuntimeError(error) 2025-12-04T09:59:14.0552127Z RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:14.0552551Z Traceback (most recent call last): 2025-12-04T09:59:14.0553228Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0553925Z getattr(self, test_name)() 2025-12-04T09:59:14.0554581Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0555259Z fn() 2025-12-04T09:59:14.0555816Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0556486Z method(*args, **kwargs) 2025-12-04T09:59:14.0557109Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0557761Z method(*args, **kwargs) 2025-12-04T09:59:14.0558386Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0559038Z with policy(): 2025-12-04T09:59:14.0559661Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0560324Z raise RuntimeError(msg) 2025-12-04T09:59:14.0561482Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T09:59:14.0562581Z 2025-12-04T09:59:14.0562770Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0563577Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0564201Z 2025-12-04T09:59:14.0564435Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0564793Z 2025-12-04T09:59:14.0564936Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:14.0565296Z Traceback (most recent call last): 2025-12-04T09:59:14.0565992Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0566684Z getattr(self, test_name)() 2025-12-04T09:59:14.0567344Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0568011Z fn() 2025-12-04T09:59:14.0568568Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0569230Z method(*args, **kwargs) 2025-12-04T09:59:14.0569851Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0570638Z method(*args, **kwargs) 2025-12-04T09:59:14.0571248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0571903Z with policy(): 2025-12-04T09:59:14.0572499Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0573166Z raise RuntimeError(msg) 2025-12-04T09:59:14.0574312Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:14.0575441Z 2025-12-04T09:59:14.0575627Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0576532Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0577396Z 2025-12-04T09:59:14.0577663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0578059Z 2025-12-04T09:59:14.0578063Z 2025-12-04T09:59:14.0578286Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:14.0578895Z Process 0 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:14.0580078Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ee9779088060e0f5.xml - 2025-12-04T09:59:14.0581185Z =========================== short test summary info ============================ 2025-12-04T09:59:14.0582231Z FAILED [9.7900s] distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda - RuntimeError: Process 0 exited with error code 10 and exception: 2025-12-04T09:59:14.0583211Z Traceback (most recent call last): 2025-12-04T09:59:14.0583993Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0584782Z getattr(self, test_name)() 2025-12-04T09:59:14.0585557Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0586316Z fn() 2025-12-04T09:59:14.0586954Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0587693Z method(*args, **kwargs) 2025-12-04T09:59:14.0588393Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0589225Z method(*args, **kwargs) 2025-12-04T09:59:14.0589839Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0590485Z with policy(): 2025-12-04T09:59:14.0591079Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0591750Z raise RuntimeError(msg) 2025-12-04T09:59:14.0592907Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 716111872 and is now 760152064. 2025-12-04T09:59:14.0594001Z 2025-12-04T09:59:14.0594189Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0594997Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0595626Z 2025-12-04T09:59:14.0595859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0596209Z 2025-12-04T09:59:14.0596388Z Process 2 exited with error code 10 and exception: 2025-12-04T09:59:14.0596767Z Traceback (most recent call last): 2025-12-04T09:59:14.0597453Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0598155Z getattr(self, test_name)() 2025-12-04T09:59:14.0598804Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0599479Z fn() 2025-12-04T09:59:14.0600041Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0600701Z method(*args, **kwargs) 2025-12-04T09:59:14.0601340Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0602000Z method(*args, **kwargs) 2025-12-04T09:59:14.0602624Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0603277Z with policy(): 2025-12-04T09:59:14.0603859Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0604529Z raise RuntimeError(msg) 2025-12-04T09:59:14.0605681Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:14.0606770Z 2025-12-04T09:59:14.0606962Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0607761Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0608387Z 2025-12-04T09:59:14.0608619Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0609134Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:14.0609565Z ====================== 1 failed, 26 deselected in 10.01s ======================= 2025-12-04T09:59:14.0609930Z Got exit code 1 2025-12-04T09:59:14.0610203Z Retrying single test... 2025-12-04T09:59:14.0610918Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7a7aa8c4ec058e09.xml 2025-12-04T09:59:14.0611721Z ============================= test session starts ============================== 2025-12-04T09:59:14.0612291Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:14.0612810Z cachedir: .pytest_cache 2025-12-04T09:59:14.0613419Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:14.0614093Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:14.0614389Z configfile: pytest.ini 2025-12-04T09:59:14.0615019Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:14.0615792Z collecting ... collected 60 items / 26 deselected / 34 selected 2025-12-04T09:59:14.0616935Z stepcurrent: skipping 26 already run items. Running only test/distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0617828Z Running 1 items in this shard 2025-12-04T09:59:14.0618030Z 2025-12-04T09:59:14.0618979Z distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda I1204 09:58:58.174000 81104 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 0 with pid 81156 2025-12-04T09:59:14.0620538Z I1204 09:58:58.175000 81104 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 1 with pid 81157 2025-12-04T09:59:14.0621969Z I1204 09:58:58.176000 81104 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 2 with pid 81158 2025-12-04T09:59:14.0623095Z I1204 09:58:58.176000 81104 site-packages/torch/testing/_internal/common_distributed.py:849] Started process 3 with pid 81159 2025-12-04T09:59:14.0625451Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 2, which does not have an explicit index. FSDP will use the current device 2. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0627458Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0629503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 1, which does not have an explicit index. FSDP will use the current device 1. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0631507Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0633540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 3, which does not have an explicit index. FSDP will use the current device 3. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0635303Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0637080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py:571: UserWarning: FSDP got the argument `device_id` cuda on rank 0, which does not have an explicit index. FSDP will use the current device 0. If this is incorrect, please explicitly call `torch.cuda.set_device()` before FSDP initialization or pass in the explicit device index as the `device_id` argument. 2025-12-04T09:59:14.0638850Z device_from_device_id = _get_device_from_device_id( 2025-12-04T09:59:14.0640028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning. 2025-12-04T09:59:14.0641126Z return func(*args, **kwargs) 2025-12-04T09:59:14.0641714Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0642712Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0644187Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0645649Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0647093Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0648437Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0649767Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0651291Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0659403Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0661025Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0662616Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0664263Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0665815Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0667405Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0669734Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 0. CUDA driver allocated memory was 714014720 and is now 760152064. 2025-12-04T09:59:14.0671561Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0672585Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0674377Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0675842Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0676979Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0678289Z [rank0]:E1204 09:59:05.749000 81156 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 0 with exit code: 10 2025-12-04T09:59:14.0679352Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0680407Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0681968Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0683722Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0685306Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0686779Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0688260Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0689819Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0691356Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0692884Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0694407Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0695946Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0697767Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0699351Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0701540Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 2. CUDA driver allocated memory was 611254272 and is now 651100160. 2025-12-04T09:59:14.0703585Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0704741Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0706584Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0708212Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0709395Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0710737Z [rank2]:E1204 09:59:05.750000 81158 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 2 with exit code: 10 2025-12-04T09:59:14.0711910Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0712967Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0714532Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0716074Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0717683Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0719032Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0720415Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0722269Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0723860Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0725432Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0727097Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0728629Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0730178Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0731764Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0734019Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:14.0735840Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0737188Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0739009Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0740521Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0741734Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0743125Z [rank1]:E1204 09:59:05.750000 81157 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 1 with exit code: 10 2025-12-04T09:59:14.0744257Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] Caught exception: 2025-12-04T09:59:14.0745371Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] Traceback (most recent call last): 2025-12-04T09:59:14.0747042Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0748669Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] getattr(self, test_name)() 2025-12-04T09:59:14.0750256Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0751677Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] fn() 2025-12-04T09:59:14.0753008Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0754408Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0755808Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0757245Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] method(*args, **kwargs) 2025-12-04T09:59:14.0758649Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0760021Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] with policy(): 2025-12-04T09:59:14.0761387Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0762790Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] raise RuntimeError(msg) 2025-12-04T09:59:14.0764734Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 3. CUDA driver allocated memory was 604962816 and is now 651100160. 2025-12-04T09:59:14.0766583Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0767619Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0769234Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0770572Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] 2025-12-04T09:59:14.0771653Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0772891Z [rank3]:E1204 09:59:05.751000 81159 site-packages/torch/testing/_internal/common_distributed.py:935] exiting process 3 with exit code: 10 2025-12-04T09:59:14.0773585Z dist init r=0, world=4 2025-12-04T09:59:14.0773822Z dist init r=3, world=4 2025-12-04T09:59:14.0774060Z dist init r=1, world=4 2025-12-04T09:59:14.0774291Z dist init r=2, world=4 2025-12-04T09:59:14.0775463Z [rank0]:[W1204 09:59:06.761351963 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 2025-12-04T09:59:14.0776936Z FAILED [9.8205s] [100%] 2025-12-04T09:59:14.0777117Z 2025-12-04T09:59:14.0777264Z =================================== FAILURES =================================== 2025-12-04T09:59:14.0777865Z _____________ TestAutogradCUDA.test_unshard_params_as_tensors_cuda _____________ 2025-12-04T09:59:14.0778409Z Traceback (most recent call last): 2025-12-04T09:59:14.0779186Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 770, in wrapper 2025-12-04T09:59:14.0779971Z self._join_processes(fn) 2025-12-04T09:59:14.0780758Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1039, in _join_processes 2025-12-04T09:59:14.0781606Z self._check_return_codes(fn, elapsed_time) 2025-12-04T09:59:14.0782476Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 1079, in _check_return_codes 2025-12-04T09:59:14.0783353Z raise RuntimeError(error) 2025-12-04T09:59:14.0783787Z RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:14.0784260Z Traceback (most recent call last): 2025-12-04T09:59:14.0785039Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0785824Z getattr(self, test_name)() 2025-12-04T09:59:14.0786560Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0787316Z fn() 2025-12-04T09:59:14.0787947Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0788805Z method(*args, **kwargs) 2025-12-04T09:59:14.0789552Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0790218Z method(*args, **kwargs) 2025-12-04T09:59:14.0790840Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0791487Z with policy(): 2025-12-04T09:59:14.0792086Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0792759Z raise RuntimeError(msg) 2025-12-04T09:59:14.0793949Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:14.0795048Z 2025-12-04T09:59:14.0795241Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0796050Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0796673Z 2025-12-04T09:59:14.0796906Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0797260Z 2025-12-04T09:59:14.0797264Z 2025-12-04T09:59:14.0797464Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T09:59:14.0798015Z Process 1 terminated with exit code 10, terminating remaining processes. 2025-12-04T09:59:14.0799074Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7a7aa8c4ec058e09.xml - 2025-12-04T09:59:14.0800058Z =========================== short test summary info ============================ 2025-12-04T09:59:14.0800999Z FAILED [9.8205s] distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda - RuntimeError: Process 1 exited with error code 10 and exception: 2025-12-04T09:59:14.0801869Z Traceback (most recent call last): 2025-12-04T09:59:14.0802773Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 925, in run_test 2025-12-04T09:59:14.0803513Z getattr(self, test_name)() 2025-12-04T09:59:14.0804248Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 772, in wrapper 2025-12-04T09:59:14.0805007Z fn() 2025-12-04T09:59:14.0805606Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0806308Z method(*args, **kwargs) 2025-12-04T09:59:14.0806966Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T09:59:14.0807659Z method(*args, **kwargs) 2025-12-04T09:59:14.0808312Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T09:59:14.0809001Z with policy(): 2025-12-04T09:59:14.0809508Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T09:59:14.0809609Z raise RuntimeError(msg) 2025-12-04T09:59:14.0810658Z RuntimeError: CUDA driver API confirmed a leak in __mp_main__.TestAutogradCUDA.test_unshard_params_as_tensors_cuda! Caching allocator allocated memory was 512 and is now reported as 61952 on device 1. CUDA driver allocated memory was 609157120 and is now 651100160. 2025-12-04T09:59:14.0810666Z 2025-12-04T09:59:14.0810866Z To execute this test, run the following from the base repo dir: 2025-12-04T09:59:14.0811409Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/distributed/fsdp/test_fsdp_core.py TestAutogradCUDA.test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0811415Z 2025-12-04T09:59:14.0811662Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T09:59:14.0811830Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T09:59:14.0811999Z ====================== 1 failed, 26 deselected in 10.04s ======================= 2025-12-04T09:59:14.0812086Z Got exit code 1 2025-12-04T09:59:14.0812554Z FAILED CONSISTENTLY: test/distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda 2025-12-04T09:59:14.0812936Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T09:59:14.0813548Z Test results will be stored in test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4f45a35aeec028b0.xml 2025-12-04T09:59:14.0813708Z ============================= test session starts ============================== 2025-12-04T09:59:14.0814033Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python 2025-12-04T09:59:14.0814137Z cachedir: .pytest_cache 2025-12-04T09:59:14.0814613Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T09:59:14.0814726Z rootdir: /var/lib/jenkins/workspace 2025-12-04T09:59:14.0814834Z configfile: pytest.ini 2025-12-04T09:59:14.0815332Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T09:59:14.0815536Z collecting ... collected 60 items / 27 deselected / 33 selected 2025-12-04T09:59:14.0815672Z stepcurrent: skipping 27 already run items. 2025-12-04T09:59:14.0815774Z Running 0 items in this shard 2025-12-04T09:59:14.0815779Z 2025-12-04T09:59:14.0816627Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4f45a35aeec028b0.xml - 2025-12-04T09:59:14.0816965Z ============================ 27 deselected in 0.02s ============================ 2025-12-04T09:59:14.0833570Z The following tests failed consistently: ['test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_after_state_dict_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_pre_backward_hook_registration_cuda_first_False_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestHooksCUDA::test_register_functions_called_cuda_first_True_mixed_precision_True_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_delayed_optim_step_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_mixture_of_experts_with_delay_before_free_offload_true_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_always_wrap_model_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_false_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_nested_wrapped_model_single_iteration_mixed_precision_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_false_shard_grad_op_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_no_shard_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParityWithDDPCUDA::test_transformer_offload_true_none_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestNoGradCUDA::test_transformer_no_grad_mixed_precision_False_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_False_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestParamInitCUDA::test_param_change_after_init_mixed_precision_True_cuda', 'test/distributed/fsdp/test_fsdp_core.py::TestAutogradCUDA::test_unshard_params_as_tensors_cuda'] 2025-12-04T09:59:14.0833668Z 2025-12-04T09:59:14.0834247Z FINISHED PRINTING LOG FILE of distributed/fsdp/test_fsdp_core 2/2 (test/test-reports/distributed.fsdp.test_fsdp_core_2.2_6137898c6891d430_.log) 2025-12-04T09:59:14.0834260Z 2025-12-04T09:59:14.0834612Z Finished distributed/fsdp/test_fsdp_core 2/2 ... [2025-12-04 09:59:13.015565][3984.623477463], took 30.45min 2025-12-04T09:59:14.0835414Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90a070d9a0caeaa7.xml 2025-12-04T09:59:14.0836217Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b56b818e7dab969.xml 2025-12-04T09:59:14.0837006Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2da5f79ab7711605.xml 2025-12-04T09:59:14.0838025Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a202ac92fafcf85d.xml 2025-12-04T09:59:14.0838870Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bacdfd4e137b31c0.xml 2025-12-04T09:59:14.0839686Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2f84fddbafa0e0f3.xml 2025-12-04T09:59:14.0840501Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8511307d41418b77.xml 2025-12-04T09:59:14.0841454Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3768a5b2a44119fc.xml 2025-12-04T09:59:14.0842260Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-31ee953fde08a139.xml 2025-12-04T09:59:14.0843053Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cf0a0887fe85c292.xml 2025-12-04T09:59:14.0843843Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-07c27c95d6f3d3d6.xml 2025-12-04T09:59:14.0844631Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ec3b2535e8e2ad7.xml 2025-12-04T09:59:14.0845418Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c7bc1bec56d6360.xml 2025-12-04T09:59:14.0846215Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1003ee713f2c1e3e.xml 2025-12-04T09:59:14.0847035Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-86ef8482fc5a0e9d.xml 2025-12-04T09:59:14.0847829Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e9238188d8477a2.xml 2025-12-04T09:59:14.0848612Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9476e56094f0b738.xml 2025-12-04T09:59:14.0849410Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-207ff9590d724b3a.xml 2025-12-04T09:59:14.0850287Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f664e87214ff2805.xml 2025-12-04T09:59:14.0851035Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-def950b7d24ceea9.xml 2025-12-04T09:59:14.0851789Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-89dfbd7b5cd71317.xml 2025-12-04T09:59:14.0852535Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bdae057bafb686b9.xml 2025-12-04T09:59:14.0853281Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-eb4953947b5f3ef2.xml 2025-12-04T09:59:14.0854083Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-532f83d54e2054ff.xml 2025-12-04T09:59:14.0854836Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3483d762b5b4fca1.xml 2025-12-04T09:59:14.0855603Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c6b2032ef8ff1e94.xml 2025-12-04T09:59:14.0856417Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5647de3303d26f02.xml 2025-12-04T09:59:14.0857439Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cff7e7504b276d84.xml 2025-12-04T09:59:14.0858290Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d2fb83ab3ccdeb6.xml 2025-12-04T09:59:14.1017508Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bd911142cc34300e.xml 2025-12-04T09:59:14.1323304Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8e84025a0dc7a16.xml 2025-12-04T09:59:14.1659743Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-392d2e7951c1c5f3.xml 2025-12-04T09:59:14.2195397Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-477ee10c9167da98.xml 2025-12-04T09:59:14.2500022Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-96eeb012f5f596ba.xml 2025-12-04T09:59:14.2869879Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc37fd9d84da442a.xml 2025-12-04T09:59:14.3137792Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cbd7e5f481e859be.xml 2025-12-04T09:59:14.3421883Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ede249f1a681285.xml 2025-12-04T09:59:14.3741107Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-11be05c94e086d26.xml 2025-12-04T09:59:14.4017764Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-16966e8ed8e62900.xml 2025-12-04T09:59:14.4315424Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90420efea6f00dc5.xml 2025-12-04T09:59:14.4588168Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6c9f36ab2b8b15ae.xml 2025-12-04T09:59:14.4915668Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d4c1fd96adc2be7.xml 2025-12-04T09:59:14.5484276Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-500277f28031837e.xml 2025-12-04T09:59:14.5813072Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-942d56c07e16c88d.xml 2025-12-04T09:59:14.6140843Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-55fdf9ad8e0a27f0.xml 2025-12-04T09:59:14.6436680Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e1cdaa245647d1a.xml 2025-12-04T09:59:14.6819748Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a996648fbbff19f5.xml 2025-12-04T09:59:14.7156304Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc1573489c80017b.xml 2025-12-04T09:59:14.7466922Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4d2b72d464b1c339.xml 2025-12-04T09:59:14.7782957Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-65dbafa4918c0ef1.xml 2025-12-04T09:59:14.8092315Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5b8e1f7dea233320.xml 2025-12-04T09:59:14.8552606Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d13641fc6f0b57c.xml 2025-12-04T09:59:14.8912824Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-29e66d82c97dbaa5.xml 2025-12-04T09:59:14.9316356Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a798bbedf3e7b999.xml 2025-12-04T09:59:14.9674320Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e0d5d8a174cb3c98.xml 2025-12-04T09:59:15.0020386Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-931d013fb4c2579a.xml 2025-12-04T09:59:15.0398611Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-92646f491493cae0.xml 2025-12-04T09:59:15.0734937Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8232c23afc6466e0.xml 2025-12-04T09:59:15.1099730Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-983af60bcd722f1d.xml 2025-12-04T09:59:15.1375031Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-84ede3fbd174dfda.xml 2025-12-04T09:59:15.1677054Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9538bfd24f807d16.xml 2025-12-04T09:59:15.1973921Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e7d2c56cd2be4bb.xml 2025-12-04T09:59:15.2301640Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1378f62336ac1630.xml 2025-12-04T09:59:15.2641432Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8e092965a6aa7362.xml 2025-12-04T09:59:15.2994812Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-19aef0a0802c58a7.xml 2025-12-04T09:59:15.3301589Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5e8c70689f4db333.xml 2025-12-04T09:59:15.3634400Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-389219a70e101b44.xml 2025-12-04T09:59:15.3923946Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22aad73f608511a0.xml 2025-12-04T09:59:15.4216581Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22bb81621d944803.xml 2025-12-04T09:59:15.4479077Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e70588b2995dc7c5.xml 2025-12-04T09:59:15.4807552Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b456a18c8ca9135a.xml 2025-12-04T09:59:15.5098742Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aedba904eee3ba73.xml 2025-12-04T09:59:15.5393137Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2d3d36f137cb39b5.xml 2025-12-04T09:59:15.5685612Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-973a0dc84b27de93.xml 2025-12-04T09:59:15.6056029Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e9342b39aaf3792.xml 2025-12-04T09:59:15.6374737Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-15b775a41cf5a439.xml 2025-12-04T09:59:15.6644452Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-56374ffd8bd068de.xml 2025-12-04T09:59:15.6968210Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6288913bb010f746.xml 2025-12-04T09:59:15.7295041Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d2350a2a3a63f23.xml 2025-12-04T09:59:15.7659863Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ee9779088060e0f5.xml 2025-12-04T09:59:15.7911959Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7a7aa8c4ec058e09.xml 2025-12-04T09:59:15.8220099Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4f45a35aeec028b0.xml 2025-12-04T09:59:16.2357690Z Uploading logs for 57116084904 to S3 2025-12-04T09:59:16.3394903Z Uploading artifacts took 0.49 seconds 2025-12-04T09:59:16.3395333Z distributed/fsdp/test_fsdp_core 2/2 failed! 2025-12-04T09:59:16.3396294Z Running distributed/algorithms/test_join 1/1 ... [2025-12-04 09:59:16.339489][3987.947405768] 2025-12-04T09:59:16.3396936Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T09:59:16.3399995Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/algorithms/test_join.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:59:16.339819] 2025-12-04T10:00:12.6004999Z 2025-12-04T10:00:12.6006110Z distributed/algorithms/test_join 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.algorithms.test_join_1.1_8f0ad2e1263a10f0_.log 2025-12-04T10:00:12.6010923Z Running 9 items in this shard: test/distributed/algorithms/test_join.py::TestJoin::test_join_kwargs, test/distributed/algorithms/test_join.py::TestJoin::test_multiple_joinable_disable, test/distributed/algorithms/test_join.py::TestJoin::test_multiple_joinables, test/distributed/algorithms/test_join.py::TestJoin::test_multiple_joinables_throw, test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable, test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable_disable, test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable_main_hooks, test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable_post_hooks, test/distributed/algorithms/test_join.py::TestJoin::test_single_joinable_throw 2025-12-04T10:00:12.6014659Z 2025-12-04T10:00:12.6015066Z Finished distributed/algorithms/test_join 1/1 ... [2025-12-04 10:00:12.600102][4044.208018492], took 0.94min 2025-12-04T10:00:12.6186394Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.algorithms.test_join/distributed.algorithms.test_join-346fdf8ca2d8d04c.xml 2025-12-04T10:00:12.7037445Z Running distributed/pipelining/test_schedule_multiproc 1/1 ... [2025-12-04 10:00:12.703118][4044.311035497] 2025-12-04T10:00:12.7038172Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:00:12.7039683Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/pipelining/test_schedule_multiproc.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:00:12.703467] 2025-12-04T10:00:33.3692728Z 2025-12-04T10:00:33.3694062Z distributed/pipelining/test_schedule_multiproc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.pipelining.test_schedule_multiproc_1.1_3173a38c7a75b752_.log 2025-12-04T10:00:33.3715676Z Running 34 items in this shard: test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_custom_function_callback, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass2, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass3, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_eval_inference_mode_ScheduleClass4, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_forward_only_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass0_shape_inference_False, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass0_shape_inference_True, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass1_shape_inference_False, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_ScheduleClass1_shape_inference_True, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_interleaved_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_interleaved_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_manual_interleaved_ScheduleClass2, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_tracer_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_grad_with_tracer_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_kwargs_with_tracer_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_kwargs_with_tracer_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_multi_iter_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_multi_iter_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass2, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass3, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_return_output_ScheduleClass4, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_schedule_with_weight_update_mlp_e2e_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_v_shape_schedules_schedule_class0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_v_shape_schedules_schedule_class1, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_zero_bubble_with_model_kwargs_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::ScheduleTest::test_zero_bubble_with_model_kwargs_ScheduleClass1, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_non_symmetric_stage_ids_schedule_class0, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_non_symmetric_stage_ids_schedule_class1, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_pipeline_schedule_runtime_custom_sched_ScheduleClass0, test/distributed/pipelining/test_schedule_multiproc.py::CustomSchedulesTest::test_schedule_with_native_zero_bubble_ScheduleClass0 2025-12-04T10:00:33.3737412Z 2025-12-04T10:00:33.3737905Z Finished distributed/pipelining/test_schedule_multiproc 1/1 ... [2025-12-04 10:00:33.368878][4064.976794701], took 0.34min 2025-12-04T10:00:33.3876869Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.pipelining.test_schedule_multiproc/distributed.pipelining.test_schedule_multiproc-4c892aab54fe07b4.xml 2025-12-04T10:00:33.4722955Z Running distributed/test_compute_comm_reordering 1/1 ... [2025-12-04 10:00:33.471636][4065.079554714] 2025-12-04T10:00:33.4723621Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:00:33.4724962Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_compute_comm_reordering.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:00:33.471972] 2025-12-04T10:02:53.3819012Z 2025-12-04T10:02:53.3820254Z distributed/test_compute_comm_reordering 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_compute_comm_reordering_1.1_7c582fe21d8b6d0b_.log 2025-12-04T10:02:53.3828370Z Running 9 items in this shard: test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_grouped_scheduler_node_combo_kernels_False, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_grouped_scheduler_node_combo_kernels_True, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_inductor_default_comms_ordering, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_nccl_heuristics, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_raise_comms, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_reorder_compute_for_overlap, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_reorder_compute_for_overlap_custom_runtime_estimation, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_sink_waits, test/distributed/test_compute_comm_reordering.py::TestComputeCommReorderingMultiProc::test_sink_waits_raise_comms 2025-12-04T10:02:53.3834730Z 2025-12-04T10:02:53.3835147Z Finished distributed/test_compute_comm_reordering 1/1 ... [2025-12-04 10:02:53.381486][4204.98940222], took 2.33min 2025-12-04T10:02:53.4000473Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_compute_comm_reordering/distributed.test_compute_comm_reordering-5eeb11f30d43fbd8.xml 2025-12-04T10:02:53.4869410Z Running distributed/test_cupy_as_tensor 1/1 ... [2025-12-04 10:02:53.486196][4205.094114024] 2025-12-04T10:02:53.4870038Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:02:53.4871273Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_cupy_as_tensor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:02:53.486557] 2025-12-04T10:02:57.2605297Z 2025-12-04T10:02:57.2606438Z distributed/test_cupy_as_tensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_cupy_as_tensor_1.1_01ccc395c80cccfc_.log 2025-12-04T10:02:57.2607827Z Running 1 items in this shard: test/distributed/test_cupy_as_tensor.py::CupyAsTensorTest::test_cupy_as_tensor 2025-12-04T10:02:57.2608396Z 2025-12-04T10:02:57.2609085Z Finished distributed/test_cupy_as_tensor 1/1 ... [2025-12-04 10:02:57.259795][4208.867711215], took 0.06min 2025-12-04T10:02:57.2779914Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_cupy_as_tensor/distributed.test_cupy_as_tensor-9bf0be6a7af397ad.xml 2025-12-04T10:02:57.3104581Z Running distributed/fsdp/test_fsdp_fx 1/1 ... [2025-12-04 10:02:57.310235][4208.918152255] 2025-12-04T10:02:57.3105193Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:02:57.3107721Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_fx.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:02:57.310565] 2025-12-04T10:03:02.3880415Z 2025-12-04T10:03:02.3881567Z distributed/fsdp/test_fsdp_fx 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_fx_1.1_5233411b5b9ade93_.log 2025-12-04T10:03:02.3883012Z Running 1 items in this shard: test/distributed/fsdp/test_fsdp_fx.py::TestSymbolicTracingCUDA::test_symbolic_tracing_outputs_cuda 2025-12-04T10:03:02.3883670Z 2025-12-04T10:03:02.3884045Z Finished distributed/fsdp/test_fsdp_fx 1/1 ... [2025-12-04 10:03:02.387460][4213.995376468], took 0.08min 2025-12-04T10:03:02.4059627Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fx/distributed.fsdp.test_fsdp_fx-d8b89ec57f22953e.xml 2025-12-04T10:03:02.4397680Z Running distributed/_tools/test_sac_ilp 1/1 ... [2025-12-04 10:03:02.439140][4214.047057176] 2025-12-04T10:03:02.4398292Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:03:02.4399830Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_tools/test_sac_ilp.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:03:02.439504] 2025-12-04T10:03:14.4828329Z 2025-12-04T10:03:14.4829466Z distributed/_tools/test_sac_ilp 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._tools.test_sac_ilp_1.1_aac1d3e83d5577ad_.log 2025-12-04T10:03:14.4831989Z Running 4 items in this shard: test/distributed/_tools/test_sac_ilp.py::TestSACILP::test_sac_ilp_case1, test/distributed/_tools/test_sac_ilp.py::TestSACILP::test_sac_ilp_case2, test/distributed/_tools/test_sac_ilp.py::TestSACILP::test_sac_ilp_case3, test/distributed/_tools/test_sac_ilp.py::TestOptimalCheckpointingPolicy::test_get_optimial_checkpointing_policy_per_module 2025-12-04T10:03:14.4834016Z 2025-12-04T10:03:14.4834401Z Finished distributed/_tools/test_sac_ilp 1/1 ... [2025-12-04 10:03:14.482447][4226.090356853], took 0.20min 2025-12-04T10:03:14.5012010Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._tools.test_sac_ilp/distributed._tools.test_sac_ilp-80280b96b0e30cba.xml 2025-12-04T10:03:14.5973849Z Running distributed/checkpoint/test_hf_storage 1/1 ... [2025-12-04 10:03:14.596734][4226.204651502] 2025-12-04T10:03:14.5974489Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:03:14.5975782Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_hf_storage.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:03:14.597097] 2025-12-04T10:03:18.8731315Z 2025-12-04T10:03:18.8732510Z distributed/checkpoint/test_hf_storage 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_hf_storage_1.1_ec1da04f72df0c46_.log 2025-12-04T10:03:18.8735928Z Running 5 items in this shard: test/distributed/checkpoint/test_hf_storage.py::TestHfStorage::test_read_data_hf, test/distributed/checkpoint/test_hf_storage.py::TestHfStorage::test_read_metadata_hf, test/distributed/checkpoint/test_hf_storage.py::TestHfStorage::test_write_data_hf, test/distributed/checkpoint/test_hf_storage.py::TestHfStorage::test_write_data_with_sharding, test/distributed/checkpoint/test_hf_storage.py::TestHfStorage::test_write_metadata_hf 2025-12-04T10:03:18.8738593Z 2025-12-04T10:03:18.8739028Z Finished distributed/checkpoint/test_hf_storage 1/1 ... [2025-12-04 10:03:18.872644][4230.480560583], took 0.07min 2025-12-04T10:03:18.8915727Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_hf_storage/distributed.checkpoint.test_hf_storage-5c05eca826b12737.xml 2025-12-04T10:03:18.9271432Z Running distributed/pipelining/test_microbatch 1/1 ... [2025-12-04 10:03:18.926521][4230.534438755] 2025-12-04T10:03:18.9272102Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:03:18.9273406Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/pipelining/test_microbatch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:03:18.926881] 2025-12-04T10:03:37.4859206Z 2025-12-04T10:03:37.4860440Z distributed/pipelining/test_microbatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.pipelining.test_microbatch_1.1_e0b58af1802f4b06_.log 2025-12-04T10:03:37.4864445Z Running 5 items in this shard: test/distributed/pipelining/test_microbatch.py::MicrobatchTestsCUDA::test_chunk_spec_cuda, test/distributed/pipelining/test_microbatch.py::MicrobatchTestsCUDA::test_split_and_merge_cuda, test/distributed/pipelining/test_microbatch.py::MicrobatchTestsCUDA::test_split_block_mask_batch_size_one_cuda, test/distributed/pipelining/test_microbatch.py::MicrobatchTestsCUDA::test_split_block_mask_cuda, test/distributed/pipelining/test_microbatch.py::MicrobatchTestsCUDA::test_split_block_mask_none_cuda 2025-12-04T10:03:37.4867262Z 2025-12-04T10:03:37.4867700Z Finished distributed/pipelining/test_microbatch 1/1 ... [2025-12-04 10:03:37.485542][4249.093458024], took 0.31min 2025-12-04T10:03:37.5046440Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.pipelining.test_microbatch/distributed.pipelining.test_microbatch-db2f7f262044cd4d.xml 2025-12-04T10:03:37.5881575Z Running distributed/tensor/test_placement_types 1/1 ... [2025-12-04 10:03:37.587571][4249.195488742] 2025-12-04T10:03:37.5882211Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:03:37.5883678Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_placement_types.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:03:37.587914] 2025-12-04T10:03:41.3619885Z 2025-12-04T10:03:41.3621298Z distributed/tensor/test_placement_types 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_placement_types_1.1_c7b4602e70c3b07a_.log 2025-12-04T10:03:41.3625091Z Running 5 items in this shard: test/distributed/tensor/test_placement_types.py::PlacementTypesTestCase::test_dynamo_can_identify_placement_classes, test/distributed/tensor/test_placement_types.py::PlacementTypesTestCase::test_equality, test/distributed/tensor/test_placement_types.py::PlacementTypesTestCase::test_strided_shard_isinstance_shard, test/distributed/tensor/test_placement_types.py::PlacementTypesTestCase::test_strided_shard_kwonly_argument, test/distributed/tensor/test_placement_types.py::PlacementTypesTestCase::test_type_identification 2025-12-04T10:03:41.3628149Z 2025-12-04T10:03:41.3628559Z Finished distributed/tensor/test_placement_types 1/1 ... [2025-12-04 10:03:41.361730][4252.96964606], took 0.06min 2025-12-04T10:03:41.3808241Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_placement_types/distributed.tensor.test_placement_types-aa6a82bf337fac31.xml 2025-12-04T10:03:41.4139464Z Running distributed/tensor/test_dtensor_dispatch_overhead 1/1 ... [2025-12-04 10:03:41.413418][4253.021336835] 2025-12-04T10:03:41.4140204Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:03:41.4141595Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_dtensor_dispatch_overhead.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:03:41.413792] 2025-12-04T10:03:51.1034727Z 2025-12-04T10:03:51.1036012Z distributed/tensor/test_dtensor_dispatch_overhead 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_dtensor_dispatch_overhead_1.1_85c49e7d8275b78b_.log 2025-12-04T10:03:51.1037844Z Running 1 items in this shard: test/distributed/tensor/test_dtensor_dispatch_overhead.py::DistOpDispatchOverHead::test_dtensor_add_op_dispatch_overhead 2025-12-04T10:03:51.1038640Z 2025-12-04T10:03:51.1039128Z Finished distributed/tensor/test_dtensor_dispatch_overhead 1/1 ... [2025-12-04 10:03:51.102835][4262.710751323], took 0.16min 2025-12-04T10:03:51.1220066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_dtensor_dispatch_overhead/distributed.tensor.test_dtensor_dispatch_overhead-1be227e0f3a4b8ca.xml 2025-12-04T10:03:51.1934669Z Running distributed/checkpoint/_experimental/test_checkpoint_reader 1/1 ... [2025-12-04 10:03:51.192867][4262.800784213] 2025-12-04T10:03:51.1935428Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:03:51.1937352Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/_experimental/test_checkpoint_reader.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:03:51.193211] 2025-12-04T10:03:55.4687268Z 2025-12-04T10:03:55.4688735Z distributed/checkpoint/_experimental/test_checkpoint_reader 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint._experimental.test_checkpoint_reader_1.1_68c37a9fa1601552_.log 2025-12-04T10:03:55.4694729Z Running 7 items in this shard: test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_partial_read, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_partial_read_different_dtypes, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_partial_read_missing_keys, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_read_checkpoint, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_read_nonexistent_checkpoint, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_read_with_kwargs, test/distributed/checkpoint/_experimental/test_checkpoint_reader.py::TestCheckpointReader::test_read_with_map_location 2025-12-04T10:03:55.4699645Z 2025-12-04T10:03:55.4700217Z Finished distributed/checkpoint/_experimental/test_checkpoint_reader 1/1 ... [2025-12-04 10:03:55.468045][4267.075962106], took 0.07min 2025-12-04T10:03:55.4879110Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint._experimental.test_checkpoint_reader/distributed.checkpoint._experimental.test_checkpoint_reader-e75c494c472cf9a1.xml 2025-12-04T10:03:55.5174872Z Running distributed/checkpoint/test_format_utils 1/1 ... [2025-12-04 10:03:55.516825][4267.124743111] 2025-12-04T10:03:55.5175548Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:03:55.5177197Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_format_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:03:55.517165] 2025-12-04T10:04:15.5861057Z 2025-12-04T10:04:15.5862641Z distributed/checkpoint/test_format_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_format_utils_1.1_04ae55b8cdf477fd_.log 2025-12-04T10:04:15.5868334Z Running 3 items in this shard: test/distributed/checkpoint/test_format_utils.py::TestFormatUtils::test_dcp_to_torch_save, test/distributed/checkpoint/test_format_utils.py::TestFormatUtils::test_online_torch_save_to_dcp, test/distributed/checkpoint/test_format_utils.py::TestFormatUtils::test_torch_save_to_dcp 2025-12-04T10:04:15.5869910Z 2025-12-04T10:04:15.5870354Z Finished distributed/checkpoint/test_format_utils 1/1 ... [2025-12-04 10:04:15.585956][4287.193872573], took 0.33min 2025-12-04T10:04:15.6059440Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_format_utils/distributed.checkpoint.test_format_utils-ff4efe8ffc0a39b9.xml 2025-12-04T10:04:15.6842526Z Running distributed/test_aten_comm_compute_reordering 1/2 ... [2025-12-04 10:04:15.683661][4287.291578499] 2025-12-04T10:04:15.6843200Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:04:15.6844525Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_aten_comm_compute_reordering.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:04:15.684014] 2025-12-04T10:10:07.8099086Z 2025-12-04T10:10:07.8100556Z distributed/test_aten_comm_compute_reordering 1/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_aten_comm_compute_reordering_1.2_69f8c7d62333ccaf_.log 2025-12-04T10:10:07.8117337Z Running 25 items in this shard: test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_grouped_scheduler_node, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_sink_waits, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingMultiProc::test_sink_waits_raise_comms, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_basic_all_reduce_bucketing, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucket_exposed_with_hidden_single_overlap, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucketing_split_for_overlap, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucketing_split_for_overlap_blocking_deps_inductor, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucketing_split_for_overlap_blocking_no_deps, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_bucketing_wait_sink, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_inductor_default_comms_ordering, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_multidtype_bucketing, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_no_bucketing_when_collective_depends_on_hiding_node, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_overlap_scheduling_via_config, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_reorder_compute_for_overlap_mul, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_schedulable_wait, test/distributed/test_aten_comm_compute_reordering.py::TestComputeCommReorderingBucketing::test_sink_waits, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_bucketing_reordering_pass_no_bucket, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_bucketing_reordering_pass_single_bucket, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_grouped_scheduler_node, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_make_graph_view_and_get_subgraph_by_path_custom_module_stack_fn, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_manual_reordering_bucketing_pass_separate_buckets, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_reorder_compute_for_overlap_mul, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_schedulable_wait, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_sink_waits, test/distributed/test_aten_comm_compute_reordering.py::TestManualOverlapBucketing::test_sink_waits_raise_comms 2025-12-04T10:10:07.8133896Z 2025-12-04T10:10:07.8134340Z Finished distributed/test_aten_comm_compute_reordering 1/2 ... [2025-12-04 10:10:07.809777][4639.417690402], took 5.87min 2025-12-04T10:10:07.8299462Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_aten_comm_compute_reordering/distributed.test_aten_comm_compute_reordering-8ab49fa352932ba1.xml 2025-12-04T10:10:07.9403071Z Running distributed/tensor/test_redistribute 2/2 ... [2025-12-04 10:10:07.939771][4639.547688652] 2025-12-04T10:10:07.9403735Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:10:07.9405041Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_redistribute.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:10:07.940119] 2025-12-04T10:11:46.8467836Z 2025-12-04T10:11:46.8469265Z distributed/tensor/test_redistribute 2/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_redistribute_2.2_51e2d05d075503bf_.log 2025-12-04T10:11:46.8489696Z Running 33 items in this shard: test/distributed/tensor/test_redistribute.py::RedistributeTest::test_one_chunk_mesh, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_partial_to_replicate_forward_backward_float32, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_partial_to_shard_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_replicate_to_local_partial_grad_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_replicate_to_local_partial_grad_float32, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_replicate_to_replicate_forward_backward_datatype_conversion, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_replicate_to_shard_forward_backward, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_shard_dim_alltoall_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_shard_dim_alltoall_float32, test/distributed/tensor/test_redistribute.py::RedistributeTest::test_shard_to_replicate_forward_backward_complex64, test/distributed/tensor/test_redistribute.py::MultiDimRedistributeTest::test_redistribute_shard_dim_multi_dim_mesh, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTest::test_generate_shard_orders, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTest::test_ordered_distribute_all_combination, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTest::test_ordered_redistribute_with_partial, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTest::test_shard_order_same_data_as_strided_shard, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_one_chunk_mesh, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_partial_to_replicate_forward_backward_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_partial_to_replicate_forward_backward_float32, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_partial_to_shard_complex64, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_redistribute_negative_shard_dim, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_redistribute_to_partial, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_redistribute_uneven_sharding, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_replicate_to_partial, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_replicate_to_replicate_forward_backward, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_replicate_to_replicate_forward_backward_datatype_conversion, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_shard_dim_alltoall_float32, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_shard_to_replicate_forward_backward_datatype_conversion, test/distributed/tensor/test_redistribute.py::RedistributeTestWithLocalTensor::test_shard_to_replicate_forward_backward_float32, test/distributed/tensor/test_redistribute.py::MultiDimRedistributeTestWithLocalTensor::test_multi_dim_mesh, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTestWithLocalTensor::test_generate_shard_orders, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTestWithLocalTensor::test_ordered_redistribute, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTestWithLocalTensor::test_ordered_redistribute_for_special_placement, test/distributed/tensor/test_redistribute.py::DistributeWithDeviceOrderTestWithLocalTensor::test_ordered_redistribute_with_partial 2025-12-04T10:11:46.8509863Z 2025-12-04T10:11:46.8510276Z Finished distributed/tensor/test_redistribute 2/2 ... [2025-12-04 10:11:46.846499][4738.454415028], took 1.65min 2025-12-04T10:11:46.8665850Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_redistribute/distributed.tensor.test_redistribute-02b614c0805e2900.xml 2025-12-04T10:11:46.9723185Z Running distributed/tensor/parallel/test_tp_style 1/1 ... [2025-12-04 10:11:46.971809][4738.579726845] 2025-12-04T10:11:46.9723873Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:11:46.9725212Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/parallel/test_tp_style.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:11:46.972182] 2025-12-04T10:12:54.1565587Z 2025-12-04T10:12:54.1566801Z distributed/tensor/parallel/test_tp_style 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.parallel.test_tp_style_1.1_54e71dcd4ed048eb_.log 2025-12-04T10:12:54.1579302Z Running 18 items in this shard: test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_colwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_colwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_input_multiple_inputs, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_kwargs_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_prepare_module_output, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_rowwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_rowwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTest::test_sequence_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_colwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_colwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_input_multiple_inputs, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_kwargs_input, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_prepare_module_output, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_rowwise_parallel_embedding, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_rowwise_parallel_style, test/distributed/tensor/parallel/test_tp_style.py::TensorParallelStyleTestWithLocalTensor::test_sequence_parallel_style 2025-12-04T10:12:54.1590930Z 2025-12-04T10:12:54.1591375Z Finished distributed/tensor/parallel/test_tp_style 1/1 ... [2025-12-04 10:12:54.156163][4805.764079227], took 1.12min 2025-12-04T10:12:54.1766153Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.parallel.test_tp_style/distributed.tensor.parallel.test_tp_style-3daa17d4beb2059f.xml 2025-12-04T10:12:54.2650594Z Running distributed/tensor/test_api 1/1 ... [2025-12-04 10:12:54.264502][4805.87241949] 2025-12-04T10:12:54.2651205Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:12:54.2652636Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:12:54.264849] 2025-12-04T10:13:57.3381745Z 2025-12-04T10:13:57.3384490Z distributed/tensor/test_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_api_1.1_f4574b86db79cb55_.log 2025-12-04T10:13:57.3394031Z Running 18 items in this shard: test/distributed/tensor/test_api.py::DTensorAPITest::test_checkpoint_apis_check_partial_placement, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module_casting, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module_input_fn_output_fn, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module_input_fn_output_fn_warning, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_module_meta, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_tensor_errors, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_tensor_rank, test/distributed/tensor/test_api.py::DTensorAPITest::test_distribute_tensor_uneven_sharding, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_checkpoint_apis_check_partial_placement, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_module, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_module_casting, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_module_input_fn_output_fn, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_module_input_fn_output_fn_warning, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_module_meta, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_tensor_errors, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_tensor_rank, test/distributed/tensor/test_api.py::DTensorAPITestWithLocalTensor::test_distribute_tensor_uneven_sharding 2025-12-04T10:13:57.3403052Z 2025-12-04T10:13:57.3403399Z Finished distributed/tensor/test_api 1/1 ... [2025-12-04 10:13:57.337881][4868.945797553], took 1.05min 2025-12-04T10:13:57.3581993Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_api/distributed.tensor.test_api-143a55cc9757e18a.xml 2025-12-04T10:13:57.4334180Z Running distributed/checkpoint/test_fsspec 1/1 ... [2025-12-04 10:13:57.432934][4869.040851408] 2025-12-04T10:13:57.4334852Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:13:57.4336131Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_fsspec.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:13:57.433293] 2025-12-04T10:14:11.8347663Z 2025-12-04T10:14:11.8348849Z distributed/checkpoint/test_fsspec 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_fsspec_1.1_8eaa241efddb416a_.log 2025-12-04T10:14:11.8350956Z Running 3 items in this shard: test/distributed/checkpoint/test_fsspec.py::TestFSSpec::test_fsspec, test/distributed/checkpoint/test_fsspec.py::TestFSSpec::test_overwrite, test/distributed/checkpoint/test_fsspec.py::TestFileSystem::test_remove_on_fail 2025-12-04T10:14:11.8352249Z 2025-12-04T10:14:11.8352670Z Finished distributed/checkpoint/test_fsspec 1/1 ... [2025-12-04 10:14:11.834429][4883.442339588], took 0.24min 2025-12-04T10:14:11.8545660Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_fsspec/distributed.checkpoint.test_fsspec-2295d11b632387c0.xml 2025-12-04T10:14:11.9313006Z Running distributed/tensor/experimental/test_tp_transform 1/1 ... [2025-12-04 10:14:11.930744][4883.5386612] 2025-12-04T10:14:11.9314023Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:14:11.9315400Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/experimental/test_tp_transform.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:14:11.931093] 2025-12-04T10:14:37.3115415Z 2025-12-04T10:14:37.3116784Z distributed/tensor/experimental/test_tp_transform 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.experimental.test_tp_transform_1.1_d11081dcea691eaf_.log 2025-12-04T10:14:37.3119632Z Running 3 items in this shard: test/distributed/tensor/experimental/test_tp_transform.py::TensorParallelTest::test_tp_transform_e2e, test/distributed/tensor/experimental/test_tp_transform.py::TensorParallelTest::test_tp_transform_no_bias, test/distributed/tensor/experimental/test_tp_transform.py::TensorParallelTest::test_tp_transform_with_uncovered_op 2025-12-04T10:14:37.3122533Z 2025-12-04T10:14:37.3123066Z Finished distributed/tensor/experimental/test_tp_transform 1/1 ... [2025-12-04 10:14:37.311071][4908.918987722], took 0.42min 2025-12-04T10:14:37.3311460Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.experimental.test_tp_transform/distributed.tensor.experimental.test_tp_transform-af912528cabb656d.xml 2025-12-04T10:14:37.4151398Z Running distributed/checkpoint/test_traverse 1/1 ... [2025-12-04 10:14:37.414586][4909.022503412] 2025-12-04T10:14:37.4152047Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:14:37.4153347Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_traverse.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:14:37.414927] 2025-12-04T10:14:41.2890809Z 2025-12-04T10:14:41.2892079Z distributed/checkpoint/test_traverse 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_traverse_1.1_eea2c84c34471245_.log 2025-12-04T10:14:41.2896505Z Running 7 items in this shard: test/distributed/checkpoint/test_traverse.py::TestTraverse::test_get_element, test/distributed/checkpoint/test_traverse.py::TestTraverse::test_set_element, test/distributed/checkpoint/test_traverse.py::TestTraverse::test_traverse_doesnt_ignore_intermediate_collections, test/distributed/checkpoint/test_traverse.py::TestTraverse::test_traverse_nested_dict, test/distributed/checkpoint/test_traverse.py::TestTraverse::test_traverse_nested_list, test/distributed/checkpoint/test_traverse.py::TestTraverse::test_traverse_shallow, test/distributed/checkpoint/test_traverse.py::TestTraverse::test_traverse_with_ordered_dict 2025-12-04T10:14:41.2900072Z 2025-12-04T10:14:41.2900515Z Finished distributed/checkpoint/test_traverse 1/1 ... [2025-12-04 10:14:41.288628][4912.896544739], took 0.06min 2025-12-04T10:14:41.3089148Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_traverse/distributed.checkpoint.test_traverse-f038bc92a00bd1c7.xml 2025-12-04T10:14:41.3516279Z Running distributed/tensor/test_random_ops 1/1 ... [2025-12-04 10:14:41.351172][4912.959089762] 2025-12-04T10:14:41.3516919Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:14:41.3518201Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_random_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:14:41.351536] 2025-12-04T10:16:09.3895045Z 2025-12-04T10:16:09.3896239Z distributed/tensor/test_random_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_random_ops_1.1_b2ded413b82ba64f_.log 2025-12-04T10:16:09.3913149Z Running 28 items in this shard: test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTest::test_fsdp_tp_model_meta_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTest::test_init_ops, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTest::test_init_with_user_generator, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTest::test_meta_tensor_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTest::test_tp_model_meta_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_deterministic_dropout_1d, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_deterministic_rand_1d, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_deterministic_uniform_2d, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_manual_seed, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_manual_seed_submesh, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_philox_state_seed_roundtrip, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_pipeline_parallel_manual_seed, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTest::test_rng_tracker_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpsTest3D::test_hsdp_tp_model_meta_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTestWithLocalTensor::test_fsdp_tp_model_meta_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTestWithLocalTensor::test_init_ops, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTestWithLocalTensor::test_init_with_user_generator, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTestWithLocalTensor::test_meta_tensor_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomInitTestWithLocalTensor::test_tp_model_meta_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_deterministic_dropout_1d, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_deterministic_rand_1d, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_deterministic_uniform_2d, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_manual_seed, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_manual_seed_submesh, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_philox_state_seed_roundtrip, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_pipeline_parallel_manual_seed, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpTestWithLocalTensor::test_rng_tracker_init, test/distributed/tensor/test_random_ops.py::DistTensorRandomOpsTest3DWithLocalTensor::test_hsdp_tp_model_meta_init 2025-12-04T10:16:09.3928613Z 2025-12-04T10:16:09.3929037Z Finished distributed/tensor/test_random_ops 1/1 ... [2025-12-04 10:16:09.388980][5000.99687848], took 1.47min 2025-12-04T10:16:09.4099868Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_random_ops/distributed.tensor.test_random_ops-a8f6b522aa6434af.xml 2025-12-04T10:16:09.5123883Z Running distributed/_composable/fsdp/test_fully_shard_logging 1/1 ... [2025-12-04 10:16:09.512113][5001.120030309] 2025-12-04T10:16:09.5124628Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:16:09.5126750Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_logging.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:16:09.512458] 2025-12-04T10:16:12.8782922Z 2025-12-04T10:16:12.8784259Z distributed/_composable/fsdp/test_fully_shard_logging 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_logging_1.1_334cd8181d21220c_.log 2025-12-04T10:16:12.8785877Z Running 0 items in this shard: 2025-12-04T10:16:12.8786101Z 2025-12-04T10:16:12.8786605Z Finished distributed/_composable/fsdp/test_fully_shard_logging 1/1 ... [2025-12-04 10:16:12.878082][5004.486000544], took 0.06min 2025-12-04T10:16:12.8985796Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_logging/distributed._composable.fsdp.test_fully_shard_logging-7e09cae3d59aa65e.xml 2025-12-04T10:16:12.9346530Z Running distributed/launcher/test_api 1/1 ... [2025-12-04 10:16:12.934176][5004.542092924] 2025-12-04T10:16:12.9347146Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:16:16.7586310Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/launcher/test_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:16:12.934530] 2025-12-04T10:16:16.7587673Z 2025-12-04T10:16:16.7588636Z distributed/launcher/test_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.launcher.test_api_1.1_4a83e51b1f3b8245_.log 2025-12-04T10:16:16.7590348Z Running 2 items in this shard: test/distributed/launcher/test_api.py::LauncherApiTest::test_launch_agent_default_signals, test/distributed/launcher/test_api.py::LauncherApiTest::test_launch_agent_sets_signals_env_var 2025-12-04T10:16:16.7591354Z 2025-12-04T10:16:16.7591714Z Finished distributed/launcher/test_api 1/1 ... [2025-12-04 10:16:16.757986][5008.365902291], took 0.06min 2025-12-04T10:16:16.7785342Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.launcher.test_api/distributed.launcher.test_api-15b87ceaa10651c5.xml 2025-12-04T10:16:16.8123540Z Running distributed/elastic/multiprocessing/test_api 1/1 ... [2025-12-04 10:16:16.811899][5008.419816702] 2025-12-04T10:16:16.8124276Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:16:16.8125898Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/multiprocessing/test_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:16:16.812347] 2025-12-04T10:16:20.6365337Z 2025-12-04T10:16:20.6366783Z distributed/elastic/multiprocessing/test_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.multiprocessing.test_api_1.1_4bf04d2a67164589_.log 2025-12-04T10:16:20.6372065Z Running 7 items in this shard: test/distributed/elastic/multiprocessing/test_api.py::SignalHandlingTest::test_start_handles_invalid_signals, test/distributed/elastic/multiprocessing/test_api.py::SignalHandlingTest::test_start_handles_windows_signals, test/distributed/elastic/multiprocessing/test_api.py::SignalHandlingTest::test_start_not_main_thread, test/distributed/elastic/multiprocessing/test_api.py::SignalHandlingTest::test_start_registers_custom_signals, test/distributed/elastic/multiprocessing/test_api.py::SignalHandlingTest::test_start_registers_default_signals, test/distributed/elastic/multiprocessing/test_api.py::SignalHandlingTest::test_start_supports_sigusr1_and_sigusr2, test/distributed/elastic/multiprocessing/test_api.py::SignalHandlingTest::test_terminate_process_handler 2025-12-04T10:16:20.6375856Z 2025-12-04T10:16:20.6376415Z Finished distributed/elastic/multiprocessing/test_api 1/1 ... [2025-12-04 10:16:20.636019][5012.243935583], took 0.06min 2025-12-04T10:16:20.6569761Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.elastic.multiprocessing.test_api/distributed.elastic.multiprocessing.test_api-12b95803d8942f3a.xml 2025-12-04T10:16:20.6915771Z Running distributed/fsdp/test_shard_utils 1/1 ... [2025-12-04 10:16:20.691111][5012.299029119] 2025-12-04T10:16:20.6916377Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:16:20.6917964Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_shard_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:16:20.691461] 2025-12-04T10:16:34.4918633Z 2025-12-04T10:16:34.4920022Z distributed/fsdp/test_shard_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_shard_utils_1.1_4e12f3568c69a797_.log 2025-12-04T10:16:34.4922983Z Running 2 items in this shard: test/distributed/fsdp/test_shard_utils.py::TestShardUtilsDistributed::test_create_chunk_sharded_tensor, test/distributed/fsdp/test_shard_utils.py::TestShardUtilsDistributedDTensor::test_create_chunk_dtensor 2025-12-04T10:16:34.4924519Z 2025-12-04T10:16:34.4924923Z Finished distributed/fsdp/test_shard_utils 1/1 ... [2025-12-04 10:16:34.491244][5026.099159773], took 0.23min 2025-12-04T10:16:34.5123468Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_shard_utils/distributed.fsdp.test_shard_utils-76ee73cffd398e77.xml 2025-12-04T10:16:34.6077525Z Running distributed/checkpoint/test_fsdp_optim_state 1/1 ... [2025-12-04 10:16:34.607132][5026.21504919] 2025-12-04T10:16:34.6078222Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:16:34.6079560Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_fsdp_optim_state.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:16:34.607481] 2025-12-04T10:16:50.7635053Z 2025-12-04T10:16:50.7636723Z distributed/checkpoint/test_fsdp_optim_state 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_fsdp_optim_state_1.1_d25d2159eaa83e63_.log 2025-12-04T10:16:50.7639298Z Running 2 items in this shard: test/distributed/checkpoint/test_fsdp_optim_state.py::FsdpOptimStateCheckpoint::test_load_sharded_optimizer_state_dict_pass_planner_False, test/distributed/checkpoint/test_fsdp_optim_state.py::FsdpOptimStateCheckpoint::test_load_sharded_optimizer_state_dict_pass_planner_True 2025-12-04T10:16:50.7640889Z 2025-12-04T10:16:50.7641621Z Finished distributed/checkpoint/test_fsdp_optim_state 1/1 ... [2025-12-04 10:16:50.763481][5042.371396647], took 0.27min 2025-12-04T10:16:50.7854478Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_fsdp_optim_state/distributed.checkpoint.test_fsdp_optim_state-f29e492ac7e0fdff.xml 2025-12-04T10:16:50.8708529Z Running distributed/checkpoint/e2e/test_e2e_save_and_load 1/1 ... [2025-12-04 10:16:50.870301][5042.478219153] 2025-12-04T10:16:50.8709235Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:16:50.8710592Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/e2e/test_e2e_save_and_load.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:16:50.870651] 2025-12-04T10:19:33.5116383Z 2025-12-04T10:19:33.5118293Z distributed/checkpoint/e2e/test_e2e_save_and_load 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.e2e.test_e2e_save_and_load_1.1_4cbd59f9e8ee7ec0_.log 2025-12-04T10:19:33.5145086Z Running 19 items in this shard: test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_different_ordered_state_dict_keys, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_False_async_checkpointer_type0_zoc_False, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_False_async_checkpointer_type2_zoc_False, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_False_async_checkpointer_type4_zoc_True, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_False_async_checkpointer_type5_zoc_True, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_True_async_checkpointer_type1_zoc_False, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_async_cached_cache_staged_state_dict_True_async_checkpointer_type3_zoc_False, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_False_model_type0, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_False_model_type1, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_False_model_type2, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_True_model_type0, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_True_model_type1, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_e2e_compile_True_model_type2, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_no_dist, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_overwrite, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_partial_load, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestE2ESaveAndLoad::test_stateful_and_non_stateful_loads, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestNoCPU::test_no_cpu, test/distributed/checkpoint/e2e/test_e2e_save_and_load.py::TestInitStateDict::test_init_state_dict 2025-12-04T10:19:33.5167339Z 2025-12-04T10:19:33.5168228Z Finished distributed/checkpoint/e2e/test_e2e_save_and_load 1/1 ... [2025-12-04 10:19:33.511642][5205.119552255], took 2.71min 2025-12-04T10:19:33.5359776Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.e2e.test_e2e_save_and_load/distributed.checkpoint.e2e.test_e2e_save_and_load-ea436a2b3918b4b7.xml 2025-12-04T10:19:34.0579674Z Uploading artifacts took 0.42 seconds 2025-12-04T10:19:34.0588313Z Running distributed/checkpoint/test_dtensor_resharding 1/1 ... [2025-12-04 10:19:34.058225][5205.666141033] 2025-12-04T10:19:34.0589227Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:19:34.0590586Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_dtensor_resharding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:19:34.058571] 2025-12-04T10:20:56.1418258Z 2025-12-04T10:20:56.1419574Z distributed/checkpoint/test_dtensor_resharding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_dtensor_resharding_1.1_a0990bee4dfbe749_.log 2025-12-04T10:20:56.1428424Z Running 10 items in this shard: test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardPlacementChange::test_1d_to_1d_reshard_placement_change_extensions0, test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardPlacementChange::test_1d_to_1d_reshard_placement_change_extensions1, test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardPlacementChange::test_1d_to_1d_reshard_placement_change_extensions2, test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardPlacementChange::test_2d_to_2d_reshard_placement_change, test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardMeshChange::test_1d_to_2d_reshard_mesh_change, test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardMeshChange::test_2d_to_1d_reshard_mesh_change, test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardMeshChange::test_dtensor_checkpoint_resharding_with_empty_shard, test/distributed/checkpoint/test_dtensor_resharding.py::TestDTensorReshardMeshChange::test_dtensor_checkpoint_with_uneven_shards, test/distributed/checkpoint/test_dtensor_resharding.py::TestCheckpointableReshard::test_uneven_reshard_with_checkpointable_api, test/distributed/checkpoint/test_dtensor_resharding.py::TestCheckpointableReshard::test_uneven_reshard_with_dtensor_shards_wrapper_api 2025-12-04T10:20:56.1435760Z 2025-12-04T10:20:56.1436238Z Finished distributed/checkpoint/test_dtensor_resharding 1/1 ... [2025-12-04 10:20:56.141337][5287.74925251], took 1.37min 2025-12-04T10:20:56.1635716Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_dtensor_resharding/distributed.checkpoint.test_dtensor_resharding-850e82d898db0167.xml 2025-12-04T10:20:56.2484963Z Running distributed/fsdp/test_fsdp_memory 1/1 ... [2025-12-04 10:20:56.247964][5287.855881783] 2025-12-04T10:20:56.2485617Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:20:56.2486891Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_memory.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:20:56.248315] 2025-12-04T10:21:12.4986325Z 2025-12-04T10:21:12.4987512Z distributed/fsdp/test_fsdp_memory 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_memory_1.1_ac8e61e17ebeaaa5_.log 2025-12-04T10:21:12.4989347Z Running 2 items in this shard: test/distributed/fsdp/test_fsdp_memory.py::TestFSDPMemory::test_fsdp_memory_ckpt_ckpt, test/distributed/fsdp/test_fsdp_memory.py::TestFSDPMemory::test_fsdp_memory_ckpt_no_ckpt 2025-12-04T10:21:12.4990403Z 2025-12-04T10:21:12.4990798Z Finished distributed/fsdp/test_fsdp_memory 1/1 ... [2025-12-04 10:21:12.498137][5304.10604602], took 0.27min 2025-12-04T10:21:12.5204001Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_memory/distributed.fsdp.test_fsdp_memory-bd1d93d0f6b45624.xml 2025-12-04T10:21:12.6070593Z Running distributed/tensor/test_pointwise_ops 1/1 ... [2025-12-04 10:21:12.606486][5304.214402354] 2025-12-04T10:21:12.6071265Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:21:12.6072564Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_pointwise_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:21:12.606849] 2025-12-04T10:21:19.6371365Z 2025-12-04T10:21:19.6372421Z distributed/tensor/test_pointwise_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_pointwise_ops_1.1_fc7ea695ae4d24dd_.log 2025-12-04T10:21:19.6383460Z Running 18 items in this shard: test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_activations, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout_backward, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_dropout_errors, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_inplace_op_partial_to_replicate, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_mul_out, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_mul_partial, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_partial_add, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTest::test_partial_replicate_add, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_activations, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout_backward, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_dropout_errors, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_inplace_op_partial_to_replicate, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_out, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_mul_partial, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_partial_add, test/distributed/tensor/test_pointwise_ops.py::DistElementwiseOpsTestWithLocalTensor::test_partial_replicate_add 2025-12-04T10:21:19.6392423Z 2025-12-04T10:21:19.6392807Z Finished distributed/tensor/test_pointwise_ops 1/1 ... [2025-12-04 10:21:19.636595][5311.244511251], took 0.12min 2025-12-04T10:21:19.6580950Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-8ffd5e5eb5f5ad7d.xml 2025-12-04T10:21:19.7639243Z Running distributed/checkpoint/test_compatibility 1/1 ... [2025-12-04 10:21:19.763318][5311.37123544] 2025-12-04T10:21:19.7639955Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:21:19.7641283Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_compatibility.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:21:19.763667] 2025-12-04T10:21:24.0379517Z 2025-12-04T10:21:24.0380746Z distributed/checkpoint/test_compatibility 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_compatibility_1.1_995845a47bb8bc7e_.log 2025-12-04T10:21:24.0384233Z Running 4 items in this shard: test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_metadata, test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_sharded_tensor_dependency, test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_storage_meta, test/distributed/checkpoint/test_compatibility.py::TestDCPCompatbility::test_with_v_2_3 2025-12-04T10:21:24.0386485Z 2025-12-04T10:21:24.0386917Z Finished distributed/checkpoint/test_compatibility 1/1 ... [2025-12-04 10:21:24.037620][5315.645535315], took 0.07min 2025-12-04T10:21:24.0588393Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_compatibility/distributed.checkpoint.test_compatibility-759684b03ee5bd2d.xml 2025-12-04T10:21:24.1059016Z Running distributed/_tools/test_mem_tracker 1/1 ... [2025-12-04 10:21:24.105423][5315.713339979] 2025-12-04T10:21:24.1059653Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:21:24.1060981Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_tools/test_mem_tracker.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:21:24.105804] 2025-12-04T10:21:28.6318988Z 2025-12-04T10:21:28.6320302Z distributed/_tools/test_mem_tracker 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._tools.test_mem_tracker_1.1_c5962f3ebcf85955_.log 2025-12-04T10:21:28.6323208Z Running 3 items in this shard: test/distributed/_tools/test_mem_tracker.py::TestMemTracker::test_accelerator_tracker_equivalence, test/distributed/_tools/test_mem_tracker.py::TestMemTracker::test_tracker_attribution, test/distributed/_tools/test_mem_tracker.py::TestMemTracker::test_tracker_with_activation_checkpointing 2025-12-04T10:21:28.6324931Z 2025-12-04T10:21:28.6325331Z Finished distributed/_tools/test_mem_tracker 1/1 ... [2025-12-04 10:21:28.631344][5320.239259083], took 0.08min 2025-12-04T10:21:28.6528648Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._tools.test_mem_tracker/distributed._tools.test_mem_tracker-e6bb23aea30c734a.xml 2025-12-04T10:21:28.6878105Z Running distributed/elastic/test_control_plane 1/1 ... [2025-12-04 10:21:28.687172][5320.295089322] 2025-12-04T10:21:28.6879274Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:21:28.6880643Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/elastic/test_control_plane.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:21:28.687531] 2025-12-04T10:21:32.5619824Z 2025-12-04T10:21:32.5621512Z distributed/elastic/test_control_plane 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.elastic.test_control_plane_1.1_74d942263f51456c_.log 2025-12-04T10:21:32.5627206Z Running 10 items in this shard: test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_nccl_trace_pickle, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_nccl_trace_pickle_with_json, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_nccl_trace_pickle_with_params, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_dump_traceback, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_get_handler_names, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_get_handler_nonexistant, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_run_handler, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_tcp, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_wait_counter_values, test/distributed/elastic/test_control_plane.py::WorkerServerTest::test_worker_server 2025-12-04T10:21:32.5632021Z 2025-12-04T10:21:32.5632557Z Finished distributed/elastic/test_control_plane 1/1 ... [2025-12-04 10:21:32.561704][5324.169620373], took 0.06min 2025-12-04T10:21:32.5839412Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.elastic.test_control_plane/distributed.elastic.test_control_plane-8adada293373a225.xml 2025-12-04T10:21:32.6175776Z Running distributed/test_fake_pg 1/1 ... [2025-12-04 10:21:32.616975][5324.224893186] 2025-12-04T10:21:32.6176498Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:21:32.6177894Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_fake_pg.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:21:32.617332] 2025-12-04T10:21:37.1938111Z 2025-12-04T10:21:37.1939351Z distributed/test_fake_pg 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_fake_pg_1.1_ecf9a296b2457f78_.log 2025-12-04T10:21:37.1945550Z Running 16 items in this shard: test/distributed/test_fake_pg.py::TestFakePG::test_all_reduce, test/distributed/test_fake_pg.py::TestFakePG::test_allgather, test/distributed/test_fake_pg.py::TestFakePG::test_alltoall, test/distributed/test_fake_pg.py::TestFakePG::test_alltoall_base, test/distributed/test_fake_pg.py::TestFakePG::test_broadcast, test/distributed/test_fake_pg.py::TestFakePG::test_construct_fsdp, test/distributed/test_fake_pg.py::TestFakePG::test_error_on_collective, test/distributed/test_fake_pg.py::TestFakePG::test_fake_pg_tracing, test/distributed/test_fake_pg.py::TestFakePG::test_fake_process_group_direct_usage_error, test/distributed/test_fake_pg.py::TestFakePG::test_fake_process_group_proper_usage_dispatch, test/distributed/test_fake_pg.py::TestFakePG::test_fsdp_fake_e2e, test/distributed/test_fake_pg.py::TestFakePG::test_fsdp_tp_fake_e2e, test/distributed/test_fake_pg.py::TestFakePG::test_recv, test/distributed/test_fake_pg.py::TestFakePG::test_reduce_scatter, test/distributed/test_fake_pg.py::TestFakePG::test_scatter, test/distributed/test_fake_pg.py::TestFakePG::test_send 2025-12-04T10:21:37.1951069Z 2025-12-04T10:21:37.1951413Z Finished distributed/test_fake_pg 1/1 ... [2025-12-04 10:21:37.193337][5328.801247965], took 0.08min 2025-12-04T10:21:37.2157401Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_fake_pg/distributed.test_fake_pg-79e3fe3f86c7485d.xml 2025-12-04T10:21:37.2508853Z Running distributed/checkpoint/test_fsdp_model_state 1/1 ... [2025-12-04 10:21:37.250381][5328.858298217] 2025-12-04T10:21:37.2509560Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:21:37.2510887Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_fsdp_model_state.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:21:37.250740] 2025-12-04T10:21:53.9090390Z 2025-12-04T10:21:53.9092016Z distributed/checkpoint/test_fsdp_model_state 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_fsdp_model_state_1.1_0d5362771b48c12a_.log 2025-12-04T10:21:53.9094492Z Running 2 items in this shard: test/distributed/checkpoint/test_fsdp_model_state.py::FsdpModelStateCheckpoint::test_fsdp_model_state_no_resharding, test/distributed/checkpoint/test_fsdp_model_state.py::FsdpModelStateCheckpoint::test_fsdp_model_state_with_resharding 2025-12-04T10:21:53.9095853Z 2025-12-04T10:21:53.9096392Z Finished distributed/checkpoint/test_fsdp_model_state 1/1 ... [2025-12-04 10:21:53.908651][5345.516567096], took 0.28min 2025-12-04T10:21:53.9315346Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_fsdp_model_state/distributed.checkpoint.test_fsdp_model_state-d2d7dab49696755b.xml 2025-12-04T10:21:54.0360583Z Running distributed/test_functional_api 1/1 ... [2025-12-04 10:21:54.035427][5345.643345113] 2025-12-04T10:21:54.0361272Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:21:54.0362704Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_functional_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:21:54.035774] 2025-12-04T10:24:43.8680110Z 2025-12-04T10:24:43.8681251Z distributed/test_functional_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_functional_api_1.1_d60bb00edf6e8a81_.log 2025-12-04T10:24:43.8689743Z Running 11 items in this shard: test/distributed/test_functional_api.py::TestMetaCollectives::test_all_reduce, test/distributed/test_functional_api.py::TestMakeFx::test_all_reduce_tracing, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_all_gather_into_tensor_coalesced_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_all_to_all_single_1d_input_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_all_to_all_single_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_all_to_all_single_split_sizes_none_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_tracing_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_tracing_with_dce_code_cuda, test/distributed/test_functional_api.py::TestCollectivesWithDistributedBackendCUDA::test_tracing_with_fakepg_cuda, test/distributed/test_functional_api.py::TestDistributedBackendCollectivesWithWorldSize4CUDA::test_permute_tensor_with_sub_group_cuda, test/distributed/test_functional_api.py::TestFunctionalAutogradWithDistributedBackendCUDA::test_all_to_all_single_cuda 2025-12-04T10:24:43.8696245Z 2025-12-04T10:24:43.8696927Z Finished distributed/test_functional_api 1/1 ... [2025-12-04 10:24:43.867521][5515.475436861], took 2.83min 2025-12-04T10:24:43.8903330Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_functional_api/distributed.test_functional_api-d3092064f68d2f41.xml 2025-12-04T10:24:43.9723917Z Running distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_ 1/1 ... [2025-12-04 10:24:43.972132][5515.580033594] 2025-12-04T10:24:43.9724676Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:24:43.9726741Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:24:43.972475] 2025-12-04T10:25:04.3881012Z 2025-12-04T10:25:04.3884461Z distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_ 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_clip_grad_norm__1.1_76ba1390d272d622_.log 2025-12-04T10:25:04.3886929Z Running 2 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_.py::TestClipGradNormWorldSize2::test_clip_grad_norm_1d, test/distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_.py::TestClipGradNormWorldSize4::test_clip_grad_norm_2d 2025-12-04T10:25:04.3888280Z 2025-12-04T10:25:04.3888793Z Finished distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_ 1/1 ... [2025-12-04 10:25:04.387528][5535.995444593], took 0.34min 2025-12-04T10:25:04.4099845Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_clip_grad_norm_/distributed._composable.fsdp.test_fully_shard_clip_grad_norm_-2322cac9c0cc490f.xml 2025-12-04T10:25:04.4915990Z Running distributed/tensor/debug/test_comm_mode 1/1 ... [2025-12-04 10:25:04.491058][5536.098976103] 2025-12-04T10:25:04.4916635Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:25:04.4918177Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/debug/test_comm_mode.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:25:04.491400] 2025-12-04T10:25:08.7168201Z 2025-12-04T10:25:08.7169435Z distributed/tensor/debug/test_comm_mode 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.debug.test_comm_mode_1.1_40ca723c6c817b86_.log 2025-12-04T10:25:08.7172173Z Running 4 items in this shard: test/distributed/tensor/debug/test_comm_mode.py::TestCommMode::test_comm_mode, test/distributed/tensor/debug/test_comm_mode.py::TestCommMode::test_comm_mode_coalesced, test/distributed/tensor/debug/test_comm_mode.py::TestCommMode::test_comm_mode_with_c10d, test/distributed/tensor/debug/test_comm_mode.py::TestCommMode::test_comm_mode_with_dtensor 2025-12-04T10:25:08.7174266Z 2025-12-04T10:25:08.7174685Z Finished distributed/tensor/debug/test_comm_mode 1/1 ... [2025-12-04 10:25:08.716329][5540.324245649], took 0.07min 2025-12-04T10:25:08.7389827Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.debug.test_comm_mode/distributed.tensor.debug.test_comm_mode-8cc829f047ed6143.xml 2025-12-04T10:25:08.7721724Z Running distributed/test_dist2 1/1 ... [2025-12-04 10:25:08.771545][5540.379462499] 2025-12-04T10:25:08.7722326Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:25:08.7723574Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_dist2.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:25:08.771889] 2025-12-04T10:27:33.9537647Z 2025-12-04T10:27:33.9538623Z distributed/test_dist2 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_dist2_1.1_cc2e2f70acaf1086_.log 2025-12-04T10:27:33.9552364Z Running 34 items in this shard: test/distributed/test_dist2.py::ProcessGroupTest::test_context_manager, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_allgather, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_allreduce, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_alltoall_base, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_barrier, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_broadcast, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_gather, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_group_split, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_reduce, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_reduce_scatter, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_remote_group_merge, test/distributed/test_dist2.py::Dist2MultiProcessTestCase::test_scatter, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_allgather, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_allreduce, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_alltoall_base, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_barrier, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_broadcast, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_gather, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_group_split, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_reduce, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_reduce_scatter, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_remote_group_merge, test/distributed/test_dist2.py::ProcessGroupGlooTest::test_scatter, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_allgather, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_allreduce, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_alltoall_base, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_barrier, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_broadcast, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_gather, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_group_split, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_reduce, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_reduce_scatter, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_remote_group_merge, test/distributed/test_dist2.py::ProcessGroupNCCLTest::test_scatter 2025-12-04T10:27:33.9564686Z 2025-12-04T10:27:33.9565020Z Finished distributed/test_dist2 1/1 ... [2025-12-04 10:27:33.953371][5685.561285542], took 2.42min 2025-12-04T10:27:33.9769126Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_dist2/distributed.test_dist2-7a48db8512284abb.xml 2025-12-04T10:27:34.0618582Z Running distributed/_composable/fsdp/test_fully_shard_grad_scaler 1/1 ... [2025-12-04 10:27:34.061277][5685.669194528] 2025-12-04T10:27:34.0619351Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:27:34.0620998Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_grad_scaler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:27:34.061615] 2025-12-04T10:27:45.3552115Z 2025-12-04T10:27:45.3553460Z distributed/_composable/fsdp/test_fully_shard_grad_scaler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_grad_scaler_1.1_5aa2313403ba4568_.log 2025-12-04T10:27:45.3555315Z Running 1 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_grad_scaler.py::TestFullyShardGradientScaler::test_gradient_scaler 2025-12-04T10:27:45.3556087Z 2025-12-04T10:27:45.3556600Z Finished distributed/_composable/fsdp/test_fully_shard_grad_scaler 1/1 ... [2025-12-04 10:27:45.354755][5696.962670424], took 0.19min 2025-12-04T10:27:45.3781934Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_grad_scaler/distributed._composable.fsdp.test_fully_shard_grad_scaler-5e3c33eaf29838b0.xml 2025-12-04T10:27:45.4680847Z Running distributed/launcher/test_run 1/1 ... [2025-12-04 10:27:45.467519][5697.075435831] 2025-12-04T10:27:45.4681451Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:27:45.4682693Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/launcher/test_run.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:27:45.467875] 2025-12-04T10:28:52.9190610Z 2025-12-04T10:28:52.9194152Z distributed/launcher/test_run 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.launcher.test_run_1.1_b22d13de769d84ff_.log 2025-12-04T10:28:52.9219050Z Running 26 items in this shard: test/distributed/launcher/test_run.py::ElasticLaunchTest::test_capture_logs_using_default_logs_specs, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_init_method_env_with_torchelastic, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_init_method_tcp_with_torchelastic, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_is_not_torchelastic_launched, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_is_torchelastic_launched, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_is_torchelastic_launched_with_logs_spec_defined, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_elastic, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_elastic_agent_raise_exception, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_elastic_multiple_agents, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_elastic_worker_raise_exception, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_run_path, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_shutdown, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_standalone, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_user_script_bash, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_user_script_default_nproc, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_user_script_python, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_user_script_python_caffe2_bc, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_launch_with_env_vars, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_logs_logs_spec_entrypoint_must_be_defined, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_min_max_nodes_parse, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_nproc_gpu_launch_configurations, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_nproc_launch_auto_configurations, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_nproc_launch_number_configurations, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_nproc_launch_unknown_configurations, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_nproc_xpu_launch_configurations, test/distributed/launcher/test_run.py::ElasticLaunchTest::test_virtual_local_rank 2025-12-04T10:28:52.9241876Z 2025-12-04T10:28:52.9242539Z Finished distributed/launcher/test_run 1/1 ... [2025-12-04 10:28:52.918996][5764.526912136], took 1.12min 2025-12-04T10:28:52.9444041Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.launcher.test_run/distributed.launcher.test_run-eeaaeb50473e3b00.xml 2025-12-04T10:28:53.0300410Z Running distributed/fsdp/test_fsdp_backward_prefetch 1/1 ... [2025-12-04 10:28:53.029786][5764.637703657] 2025-12-04T10:28:53.0301534Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:28:53.0305209Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_backward_prefetch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:28:53.030191] 2025-12-04T10:29:03.2209710Z 2025-12-04T10:29:03.2211028Z distributed/fsdp/test_fsdp_backward_prefetch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_backward_prefetch_1.1_29df4062c54c1e1a_.log 2025-12-04T10:29:03.2212629Z Running 1 items in this shard: test/distributed/fsdp/test_fsdp_backward_prefetch.py::TestBackwardPrefetch::test_backward_prefetch 2025-12-04T10:29:03.2213654Z 2025-12-04T10:29:03.2214105Z Finished distributed/fsdp/test_fsdp_backward_prefetch 1/1 ... [2025-12-04 10:29:03.220456][5774.828371753], took 0.17min 2025-12-04T10:29:03.2440392Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_backward_prefetch/distributed.fsdp.test_fsdp_backward_prefetch-9d6c65a3bd838e6b.xml 2025-12-04T10:29:03.3339759Z Running distributed/checkpoint/test_checkpoint 1/1 ... [2025-12-04 10:29:03.333736][5774.941652027] 2025-12-04T10:29:03.3340451Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:29:03.3343208Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_checkpoint.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:29:03.334118] 2025-12-04T10:29:49.7681378Z 2025-12-04T10:29:49.7682539Z distributed/checkpoint/test_checkpoint 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_checkpoint_1.1_d7eb3fb6652ade87_.log 2025-12-04T10:29:49.7688042Z Running 8 items in this shard: test/distributed/checkpoint/test_checkpoint.py::TestDistributedCheckpointing::test_default_metadata, test/distributed/checkpoint/test_checkpoint.py::TestDistributedCheckpointing::test_tensor_metadata_with_missing_rank_spec, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_dummy_reader_works, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_dummy_writer_works, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_load_error_handling, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_load_error_handling_no_dist, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_save_error_handling, test/distributed/checkpoint/test_checkpoint.py::TestDistributedFailure::test_save_error_handling_no_dist 2025-12-04T10:29:49.7692471Z 2025-12-04T10:29:49.7692900Z Finished distributed/checkpoint/test_checkpoint 1/1 ... [2025-12-04 10:29:49.767816][5821.375732101], took 0.77min 2025-12-04T10:29:49.7919503Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_checkpoint/distributed.checkpoint.test_checkpoint-698955a0be6378e2.xml 2025-12-04T10:29:49.8779184Z Running distributed/_pycute/test_coalesce 1/1 ... [2025-12-04 10:29:49.877645][5821.485550262] 2025-12-04T10:29:49.8779822Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:29:49.8781872Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_pycute/test_coalesce.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:29:49.877999] 2025-12-04T10:29:53.6522975Z 2025-12-04T10:29:53.6524133Z distributed/_pycute/test_coalesce 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_coalesce_1.1_b9854b582e22535e_.log 2025-12-04T10:29:53.6525539Z Running 1 items in this shard: test/distributed/_pycute/test_coalesce.py::TestCoalesce::test_coalesce 2025-12-04T10:29:53.6528000Z 2025-12-04T10:29:53.6528690Z Finished distributed/_pycute/test_coalesce 1/1 ... [2025-12-04 10:29:53.651773][5825.259685699], took 0.06min 2025-12-04T10:29:53.6757234Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._pycute.test_coalesce/distributed._pycute.test_coalesce-d2727b6d77166552.xml 2025-12-04T10:29:53.7116189Z Running distributed/_pycute/test_complement 1/1 ... [2025-12-04 10:29:53.711080][5825.318996735] 2025-12-04T10:29:53.7116831Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:29:53.7118114Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_pycute/test_complement.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:29:53.711428] 2025-12-04T10:29:57.4854645Z 2025-12-04T10:29:57.4855779Z distributed/_pycute/test_complement 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_complement_1.1_ccd05958479ced51_.log 2025-12-04T10:29:57.4857596Z Running 1 items in this shard: test/distributed/_pycute/test_complement.py::TestComplement::test_complement 2025-12-04T10:29:57.4858181Z 2025-12-04T10:29:57.4858599Z Finished distributed/_pycute/test_complement 1/1 ... [2025-12-04 10:29:57.484890][5829.092805789], took 0.06min 2025-12-04T10:29:57.5086790Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._pycute.test_complement/distributed._pycute.test_complement-323506218bd25d4f.xml 2025-12-04T10:29:57.5413356Z Running distributed/_pycute/test_composition 1/1 ... [2025-12-04 10:29:57.540735][5829.148652684] 2025-12-04T10:29:57.5414026Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:29:57.5415307Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_pycute/test_composition.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:29:57.541083] 2025-12-04T10:30:01.3151532Z 2025-12-04T10:30:01.3152898Z distributed/_pycute/test_composition 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_composition_1.1_6a9f660c56ddbb95_.log 2025-12-04T10:30:01.3154406Z Running 1 items in this shard: test/distributed/_pycute/test_composition.py::TestComposition::test_composition 2025-12-04T10:30:01.3154982Z 2025-12-04T10:30:01.3155404Z Finished distributed/_pycute/test_composition 1/1 ... [2025-12-04 10:30:01.314824][5832.922740995], took 0.06min 2025-12-04T10:30:01.3385068Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._pycute.test_composition/distributed._pycute.test_composition-91e42d2ac7610498.xml 2025-12-04T10:30:01.3734218Z Running distributed/_pycute/test_int_tuple 1/1 ... [2025-12-04 10:30:01.372838][5832.980754715] 2025-12-04T10:30:01.3734864Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:30:01.3736155Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_pycute/test_int_tuple.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:30:01.373189] 2025-12-04T10:30:05.1977813Z 2025-12-04T10:30:05.1978914Z distributed/_pycute/test_int_tuple 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_int_tuple_1.1_1b6829b59a3a12af_.log 2025-12-04T10:30:05.1984790Z Running 12 items in this shard: test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_crd2idx_basic, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_crd2idx_idx2crd_roundtrip, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_crd2idx_int_with_tuple_shape, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_crd2idx_none, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_crd2idx_tuple, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_idx2crd_basic, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_idx2crd_crd2idx_roundtrip, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_idx2crd_tuple, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_inner_product, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_product, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_shape_div, test/distributed/_pycute/test_int_tuple.py::TestIntTuple::test_suffix_product 2025-12-04T10:30:05.1989662Z 2025-12-04T10:30:05.1990148Z Finished distributed/_pycute/test_int_tuple 1/1 ... [2025-12-04 10:30:05.197211][5836.805125867], took 0.06min 2025-12-04T10:30:05.2213714Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._pycute.test_int_tuple/distributed._pycute.test_int_tuple-1604350619512e65.xml 2025-12-04T10:30:05.2517261Z Running distributed/_pycute/test_left_inverse 1/1 ... [2025-12-04 10:30:05.251112][5836.859028412] 2025-12-04T10:30:05.2517960Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:30:05.2519245Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_pycute/test_left_inverse.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:30:05.251510] 2025-12-04T10:30:09.0253826Z 2025-12-04T10:30:09.0254902Z distributed/_pycute/test_left_inverse 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_left_inverse_1.1_e810fe2e4745b377_.log 2025-12-04T10:30:09.0256408Z Running 1 items in this shard: test/distributed/_pycute/test_left_inverse.py::TestLeftInverse::test_left_inverse 2025-12-04T10:30:09.0257201Z 2025-12-04T10:30:09.0257632Z Finished distributed/_pycute/test_left_inverse 1/1 ... [2025-12-04 10:30:09.024928][5840.632844521], took 0.06min 2025-12-04T10:30:09.0491724Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._pycute.test_left_inverse/distributed._pycute.test_left_inverse-7b550f03a54828f5.xml 2025-12-04T10:30:09.0847088Z Running distributed/_pycute/test_right_inverse 1/1 ... [2025-12-04 10:30:09.084059][5840.691976263] 2025-12-04T10:30:09.0847762Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:30:09.0849044Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_pycute/test_right_inverse.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:30:09.084411] 2025-12-04T10:30:12.8581541Z 2025-12-04T10:30:12.8582758Z distributed/_pycute/test_right_inverse 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_right_inverse_1.1_c9aa035dc9548e77_.log 2025-12-04T10:30:12.8584245Z Running 1 items in this shard: test/distributed/_pycute/test_right_inverse.py::TestRightInverse::test_right_inverse 2025-12-04T10:30:12.8584847Z 2025-12-04T10:30:12.8585272Z Finished distributed/_pycute/test_right_inverse 1/1 ... [2025-12-04 10:30:12.857837][5844.465753575], took 0.06min 2025-12-04T10:30:12.8822360Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._pycute.test_right_inverse/distributed._pycute.test_right_inverse-5437f0847845b913.xml 2025-12-04T10:30:12.9176028Z Running distributed/_composable/test_replicate 1/1 ... [2025-12-04 10:30:12.917065][5844.524982886] 2025-12-04T10:30:12.9176827Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:30:12.9178339Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/test_replicate.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:30:12.917423] 2025-12-04T10:31:27.6172887Z 2025-12-04T10:31:27.6174465Z distributed/_composable/test_replicate 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.test_replicate_1.1_ede2d02b7e8a4250_.log 2025-12-04T10:31:27.6184701Z Running 17 items in this shard: test/distributed/_composable/test_replicate.py::ReplicateStateDictTest::test_replicate_non_root_multiple_save_load, test/distributed/_composable/test_replicate.py::ReplicateStateDictTest::test_replicate_single_module_save_load, test/distributed/_composable/test_replicate.py::ReplicateTest::test_replicate_device_id, test/distributed/_composable/test_replicate.py::ReplicateTest::test_replicate_ignore_module, test/distributed/_composable/test_replicate.py::ReplicateTest::test_replicate_move_args_kwargs_to_device, test/distributed/_composable/test_replicate.py::ReplicateTest::test_replicate_multi_module, test/distributed/_composable/test_replicate.py::ReplicateTest::test_replicate_single_module, test/distributed/_composable/test_replicate.py::ReplicateTest::test_replicate_with_kwargs, test/distributed/_composable/test_replicate.py::ReplicateTest::test_replicate_wrong_device_id_type, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_device_id, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_fully_shard_init, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_ignore_module, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_move_args_kwargs_to_device, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_multi_module, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_single_module, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_with_kwargs, test/distributed/_composable/test_replicate.py::ReplicateFullyShardInit::test_replicate_wrong_device_id_type 2025-12-04T10:31:27.6193383Z 2025-12-04T10:31:27.6193789Z Finished distributed/_composable/test_replicate 1/1 ... [2025-12-04 10:31:27.616680][5919.224595513], took 1.24min 2025-12-04T10:31:27.6416392Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.test_replicate/distributed._composable.test_replicate-5594e5fd77ce79b5.xml 2025-12-04T10:31:27.7258143Z Running distributed/checkpoint/test_hsdp_checkpoint 1/1 ... [2025-12-04 10:31:27.725229][5919.333146113] 2025-12-04T10:31:27.7258849Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:31:27.7260222Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_hsdp_checkpoint.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:31:27.725574] 2025-12-04T10:31:59.1199137Z 2025-12-04T10:31:59.1200463Z distributed/checkpoint/test_hsdp_checkpoint 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_hsdp_checkpoint_1.1_38b6379e9fe79671_.log 2025-12-04T10:31:59.1204122Z Running 4 items in this shard: test/distributed/checkpoint/test_hsdp_checkpoint.py::TestHSDPCheckpoint::test_hsdp_checkpoint_is_even_sharded_model_False, test/distributed/checkpoint/test_hsdp_checkpoint.py::TestHSDPCheckpoint::test_hsdp_checkpoint_is_even_sharded_model_True, test/distributed/checkpoint/test_hsdp_checkpoint.py::TestHSDPCheckpoint::test_hsdp_fsdp_checkpoint_conversion_is_even_sharded_model_False, test/distributed/checkpoint/test_hsdp_checkpoint.py::TestHSDPCheckpoint::test_hsdp_fsdp_checkpoint_conversion_is_even_sharded_model_True 2025-12-04T10:31:59.1206868Z 2025-12-04T10:31:59.1207346Z Finished distributed/checkpoint/test_hsdp_checkpoint 1/1 ... [2025-12-04 10:31:59.119653][5950.727568389], took 0.52min 2025-12-04T10:31:59.1446345Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_hsdp_checkpoint/distributed.checkpoint.test_hsdp_checkpoint-293bcc74b378a9a0.xml 2025-12-04T10:31:59.2272590Z Running distributed/tensor/parallel/test_parallelize_api 1/1 ... [2025-12-04 10:31:59.226655][5950.834573142] 2025-12-04T10:31:59.2273295Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:31:59.2274655Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/parallel/test_parallelize_api.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:31:59.227000] 2025-12-04T10:33:46.7644824Z 2025-12-04T10:33:46.7646268Z distributed/tensor/parallel/test_parallelize_api 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.parallel.test_parallelize_api_1.1_a79c3b02a80366e9_.log 2025-12-04T10:33:46.7671597Z Running 32 items in this shard: test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_empty_plan, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_linear_col_wise_parallel, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_linear_row_wise_parallel, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_mlp_with_module_api, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_mlp_with_module_api_nested, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_multi_wildcard, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_src_data_rank, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_with_digit, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_with_no_match, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_with_question, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_with_root_module, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_parallelize_module_with_star, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_prepare_module_input, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_prepare_module_input_output, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_prepare_module_output, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITests::test_under_devicemesh_context, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_empty_plan, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_linear_col_wise_parallel, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_linear_row_wise_parallel, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_mlp_with_module_api, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_mlp_with_module_api_nested, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_multi_wildcard, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_src_data_rank, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_with_digit, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_with_no_match, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_with_question, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_with_root_module, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_parallelize_module_with_star, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_prepare_module_input, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_prepare_module_input_output, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_prepare_module_output, test/distributed/tensor/parallel/test_parallelize_api.py::TensorParallelAPITestsWithLocalTensor::test_under_devicemesh_context 2025-12-04T10:33:46.7692985Z 2025-12-04T10:33:46.7693495Z Finished distributed/tensor/parallel/test_parallelize_api 1/1 ... [2025-12-04 10:33:46.763960][6058.371876312], took 1.79min 2025-12-04T10:33:46.7897408Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.parallel.test_parallelize_api/distributed.tensor.parallel.test_parallelize_api-e24bc2790e3eed77.xml 2025-12-04T10:33:46.8978387Z Running distributed/fsdp/test_fsdp_state_dict 1/2 ... [2025-12-04 10:33:46.897170][6058.505087901] 2025-12-04T10:33:46.8979034Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:33:46.8980351Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_state_dict.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:33:46.897509] 2025-12-04T10:40:21.2626989Z 2025-12-04T10:40:21.2628176Z distributed/fsdp/test_fsdp_state_dict 1/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_state_dict_1.2_f864b6fe160d675b_.log 2025-12-04T10:40:21.2702426Z Running 78 items in this shard: test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_False_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_fp16_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload1_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_mixed_precision_False_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_False_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload0_mixed_precision_True_state_dict_rank0_and_offload_True_use_orig_params_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_buffers_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload1_mixed_precision_False_state_dict_rank0_and_offload_True_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_keys_state_dict_type_local_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_keys_state_dict_type_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_both_after_wrap_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_dest_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_source_after_wrap_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_sharded_state_dict_checkpoint_wrap_source_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_both_after_wrap_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_both_after_wrap_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_both_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_both_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_dest_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_dest_rank0_only_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_source_after_wrap_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_fsdp_state_dict_with_activation_checkpoint_state_dict_type_state_dict_checkpoint_wrap_source_rank0_only_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_full_state_dict_missing_unexpected_keys_cleaned, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_False_state_dict_rank0_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_False_state_dict_rank0_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_False_state_dict_rank0_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_True_state_dict_rank0_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_True_state_dict_rank0_and_offload_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_True_state_dict_rank0_and_offload_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_sharded_state_dict_state_dict_rank0_and_offload_False_fsdp_root_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_sharded_state_dict_state_dict_rank0_and_offload_True_fsdp_root_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_False_fsdp_root_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_False_fsdp_root_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_True_fsdp_root_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_rank0_offload_save_load_flow_use_orig_params_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_save_load_flow_state_dict_type_sharded_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_type, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_False_ignore_inner_False_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_False_ignore_inner_True_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_True_ignore_inner_False_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_True_ignore_inner_False_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_sharded_state_dict_prefix_True_ignore_inner_True_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_False_ignore_inner_False_mixed_precision_False, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_False_ignore_inner_True_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_ignored_modules_state_dict_type_state_dict_prefix_True_ignore_inner_True_mixed_precision_True, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_state_dict_with_shared_parameters_state_dict_type_state_dict, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict::test_torch_save_load, test/distributed/fsdp/test_fsdp_state_dict.py::TestFSDPStateDict4GPUs::test_local_state_dict_reshard 2025-12-04T10:40:21.2773120Z 2025-12-04T10:40:21.2773532Z Finished distributed/fsdp/test_fsdp_state_dict 1/2 ... [2025-12-04 10:40:21.264033][6452.871945679], took 6.57min 2025-12-04T10:40:21.2899405Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_state_dict/distributed.fsdp.test_fsdp_state_dict-3c13b82ce7076bc1.xml 2025-12-04T10:40:21.8294585Z Uploading artifacts took 0.46 seconds 2025-12-04T10:40:21.8295569Z Running distributed/_pycute/test_typing 1/1 ... [2025-12-04 10:40:21.829407][6453.437324358] 2025-12-04T10:40:21.8296190Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:40:21.8300008Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_pycute/test_typing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:40:21.829761] 2025-12-04T10:40:25.6539434Z 2025-12-04T10:40:25.6540568Z distributed/_pycute/test_typing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._pycute.test_typing_1.1_70d9a252095d6a68_.log 2025-12-04T10:40:25.6542184Z Running 1 items in this shard: test/distributed/_pycute/test_typing.py::TestTyping::test_typing 2025-12-04T10:40:25.6542788Z 2025-12-04T10:40:25.6543171Z Finished distributed/_pycute/test_typing 1/1 ... [2025-12-04 10:40:25.653533][6457.261448532], took 0.06min 2025-12-04T10:40:25.6780599Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._pycute.test_typing/distributed._pycute.test_typing-1c9aabc95fed14a1.xml 2025-12-04T10:40:25.7141421Z Running distributed/test_distributed_spawn 1/9 ... [2025-12-04 10:40:25.713926][6457.321842852] 2025-12-04T10:40:25.7143805Z Running distributed tests for the test backend with env init_method 2025-12-04T10:40:25.7145738Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:40:25.7149473Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:40:25.714759] 2025-12-04T10:40:29.2914232Z 2025-12-04T10:40:29.2915325Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_8732ec05eb19aa05_.log 2025-12-04T10:40:29.2916396Z Running 0 items in this shard: 2025-12-04T10:40:29.2916610Z 2025-12-04T10:40:29.2919067Z Running distributed tests for the test backend with file init_method 2025-12-04T10:40:29.2921055Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:40:29.2925506Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:40:29.292326] 2025-12-04T10:40:32.8707319Z 2025-12-04T10:40:32.8708479Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_28ca104a37c9a833_.log 2025-12-04T10:40:32.8709650Z Running 0 items in this shard: 2025-12-04T10:40:32.8709879Z 2025-12-04T10:40:32.8714906Z Running distributed tests for the mpi backend with env init_method 2025-12-04T10:40:33.0015557Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:40:33.0018791Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:40:33.001512] 2025-12-04T10:40:37.1935200Z 2025-12-04T10:40:37.1936330Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_4a0940f8014b8eef_.log 2025-12-04T10:40:37.1937772Z Running 0 items in this shard: 2025-12-04T10:40:37.1938107Z Running 0 items in this shard: 2025-12-04T10:40:37.1938452Z Running 0 items in this shard: 2025-12-04T10:40:37.1938667Z 2025-12-04T10:40:37.1941745Z Running distributed tests for the mpi backend with file init_method 2025-12-04T10:40:37.3200301Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:40:37.3201846Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:40:37.319871] 2025-12-04T10:40:41.5207221Z 2025-12-04T10:40:41.5208395Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_dc17769dd5c2239f_.log 2025-12-04T10:40:41.5209443Z Running 0 items in this shard: 2025-12-04T10:40:41.5209783Z Running 0 items in this shard: 2025-12-04T10:40:41.5210333Z Running 0 items in this shard: 2025-12-04T10:40:41.5210644Z 2025-12-04T10:40:41.5210921Z Running distributed tests for the nccl backend with env init_method 2025-12-04T10:40:41.5212686Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:40:41.5216532Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:40:41.521454] 2025-12-04T10:44:28.6748240Z 2025-12-04T10:44:28.6749878Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_3cbdf0379e4c6767_.log 2025-12-04T10:44:28.6770739Z Running 35 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger 2025-12-04T10:44:28.6790836Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager 2025-12-04T10:44:28.6792116Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU 2025-12-04T10:44:28.6793467Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last 2025-12-04T10:44:28.6794836Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync 2025-12-04T10:44:28.6796087Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda 2025-12-04T10:44:28.6797337Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg 2025-12-04T10:44:28.6798576Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max 2025-12-04T10:44:28.6799830Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min 2025-12-04T10:44:28.6801239Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported 2025-12-04T10:44:28.6802559Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max 2025-12-04T10:44:28.6803762Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min 2025-12-04T10:44:28.6804894Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max 2025-12-04T10:44:28.6806011Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async 2025-12-04T10:44:28.6807215Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async 2025-12-04T10:44:28.6808441Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split 2025-12-04T10:44:28.6809740Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex 2025-12-04T10:44:28.6811104Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group 2025-12-04T10:44:28.6812470Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group 2025-12-04T10:44:28.6813784Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags 2025-12-04T10:44:28.6815052Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping 2025-12-04T10:44:28.6816404Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph 2025-12-04T10:44:28.6817837Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu 2025-12-04T10:44:28.6819144Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params 2025-12-04T10:44:28.6820488Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace 2025-12-04T10:44:28.6822054Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs 2025-12-04T10:44:28.6823222Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group 2025-12-04T10:44:28.6824335Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv 2025-12-04T10:44:28.6825502Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup 2025-12-04T10:44:28.6826728Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups 2025-12-04T10:44:28.6828045Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size 2025-12-04T10:44:28.6829475Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module 2025-12-04T10:44:28.6830714Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product 2025-12-04T10:44:28.6831829Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl 2025-12-04T10:44:28.6833303Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu 2025-12-04T10:44:28.6834525Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger 2025-12-04T10:44:28.6835420Z 2025-12-04T10:44:28.6835683Z Running distributed tests for the nccl backend with file init_method 2025-12-04T10:44:28.6836177Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:44:28.6837508Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:44:28.676613] 2025-12-04T10:48:15.5931548Z 2025-12-04T10:48:15.5932698Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_25c7f8918b3d0b51_.log 2025-12-04T10:48:15.5955689Z Running 35 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger 2025-12-04T10:48:15.5974998Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager 2025-12-04T10:48:15.5976359Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU 2025-12-04T10:48:15.5977936Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last 2025-12-04T10:48:15.5979339Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync 2025-12-04T10:48:15.5988275Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda 2025-12-04T10:48:15.5990302Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg 2025-12-04T10:48:15.5991640Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max 2025-12-04T10:48:15.5992904Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min 2025-12-04T10:48:15.5994220Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported 2025-12-04T10:48:15.5995609Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max 2025-12-04T10:48:15.5996798Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min 2025-12-04T10:48:15.5997947Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max 2025-12-04T10:48:15.5999074Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async 2025-12-04T10:48:15.6000253Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async 2025-12-04T10:48:15.6001477Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split 2025-12-04T10:48:15.6002796Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex 2025-12-04T10:48:15.6004171Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group 2025-12-04T10:48:15.6005536Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group 2025-12-04T10:48:15.6006833Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags 2025-12-04T10:48:15.6008115Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping 2025-12-04T10:48:15.6009463Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph 2025-12-04T10:48:15.6010636Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu 2025-12-04T10:48:15.6011880Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params 2025-12-04T10:48:15.6013190Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace 2025-12-04T10:48:15.6014379Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs 2025-12-04T10:48:15.6015539Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group 2025-12-04T10:48:15.6017011Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv 2025-12-04T10:48:15.6018247Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup 2025-12-04T10:48:15.6019461Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups 2025-12-04T10:48:15.6020986Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size 2025-12-04T10:48:15.6022433Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module 2025-12-04T10:48:15.6023662Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product 2025-12-04T10:48:15.6024795Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl 2025-12-04T10:48:15.6025958Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu 2025-12-04T10:48:15.6027342Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger 2025-12-04T10:48:15.6028108Z 2025-12-04T10:48:15.6028361Z Running distributed tests for the gloo backend with env init_method 2025-12-04T10:48:15.6028880Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:48:15.6030255Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:48:15.595174] 2025-12-04T10:52:53.4845422Z 2025-12-04T10:52:53.4848674Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_6f55519eb0301937_.log 2025-12-04T10:52:53.4868641Z Running 35 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger 2025-12-04T10:52:53.4887916Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager 2025-12-04T10:52:53.4889186Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU 2025-12-04T10:52:53.4890549Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last 2025-12-04T10:52:53.4891913Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync 2025-12-04T10:52:53.4893167Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda 2025-12-04T10:52:53.4894422Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg 2025-12-04T10:52:53.4895665Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max 2025-12-04T10:52:53.4897288Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min 2025-12-04T10:52:53.4898676Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported 2025-12-04T10:52:53.4900022Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max 2025-12-04T10:52:53.4901251Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min 2025-12-04T10:52:53.4902453Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max 2025-12-04T10:52:53.4903641Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async 2025-12-04T10:52:53.4904858Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async 2025-12-04T10:52:53.4906123Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split 2025-12-04T10:52:53.4907462Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex 2025-12-04T10:52:53.4908962Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group 2025-12-04T10:52:53.4910323Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group 2025-12-04T10:52:53.4911622Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags 2025-12-04T10:52:53.4912881Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping 2025-12-04T10:52:53.4914145Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph 2025-12-04T10:52:53.4915349Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu 2025-12-04T10:52:53.4916609Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params 2025-12-04T10:52:53.4917918Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace 2025-12-04T10:52:53.4919113Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs 2025-12-04T10:52:53.4920232Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group 2025-12-04T10:52:53.4921723Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv 2025-12-04T10:52:53.4922895Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup 2025-12-04T10:52:53.4924118Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups 2025-12-04T10:52:53.4925439Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size 2025-12-04T10:52:53.4926875Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module 2025-12-04T10:52:53.4928099Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product 2025-12-04T10:52:53.4929347Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl 2025-12-04T10:52:53.4930507Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu 2025-12-04T10:52:53.4931799Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger 2025-12-04T10:52:53.4932559Z 2025-12-04T10:52:53.4932828Z Running distributed tests for the gloo backend with file init_method 2025-12-04T10:52:53.4933435Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:52:53.4934762Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=1', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:52:53.485987] 2025-12-04T10:57:31.0735284Z 2025-12-04T10:57:31.0736530Z distributed/test_distributed_spawn 1/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_1.9_c42c9aaca0d3f434_.log 2025-12-04T10:57:31.0757354Z Running 35 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger 2025-12-04T10:57:31.0776557Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_3_level_hierarchical_model_averager 2025-12-04T10:57:31.0778059Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallelCPU 2025-12-04T10:57:31.0779464Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Channels_Last 2025-12-04T10:57:31.0780848Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync 2025-12-04T10:57:31.0782145Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_into_cat_tensor_cuda 2025-12-04T10:57:31.0783435Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_default_pg 2025-12-04T10:57:31.0784718Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_max 2025-12-04T10:57:31.0786005Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_group_min 2025-12-04T10:57:31.0787410Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max_complex_unsupported 2025-12-04T10:57:31.0788864Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_max 2025-12-04T10:57:31.0790067Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_full_group_min 2025-12-04T10:57:31.0791212Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_max 2025-12-04T10:57:31.0792328Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_async 2025-12-04T10:57:31.0793507Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum_cuda_async 2025-12-04T10:57:31.0794738Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split 2025-12-04T10:57:31.0796048Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_cuda_complex 2025-12-04T10:57:31.0797403Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group 2025-12-04T10:57:31.0798761Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_full_group 2025-12-04T10:57:31.0800061Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_gloo_tags 2025-12-04T10:57:31.0801398Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping 2025-12-04T10:57:31.0802649Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_compile_static_graph 2025-12-04T10:57:31.0803818Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_logging_data_cpu 2025-12-04T10:57:31.0805082Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_native_mixed_precision_ignored_params 2025-12-04T10:57:31.0806395Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_profiling_execution_trace 2025-12-04T10:57:31.0807606Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs 2025-12-04T10:57:31.0808732Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_full_group 2025-12-04T10:57:31.0809806Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_irecv 2025-12-04T10:57:31.0813748Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo_subgroup 2025-12-04T10:57:31.0814919Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups 2025-12-04T10:57:31.0816200Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_new_subgroups_world_size_not_divisible_by_group_size 2025-12-04T10:57:31.0817874Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_output_unused_in_loss_tuple_module 2025-12-04T10:57:31.0819108Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_product 2025-12-04T10:57:31.0820227Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl 2025-12-04T10:57:31.0821688Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_static_graph_api_cpu 2025-12-04T10:57:31.0823018Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_verify_model_across_rank_without_logger 2025-12-04T10:57:31.0823904Z 2025-12-04T10:57:31.0824311Z Finished distributed/test_distributed_spawn 1/9 ... [2025-12-04 10:57:31.074722][7482.682638476], took 17.09min 2025-12-04T10:57:31.1014396Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-db161ee1d414a014.xml 2025-12-04T10:57:31.1780421Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aee66205f8817bd7.xml 2025-12-04T10:57:31.2112887Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f4fea7b2e6cf3a65.xml 2025-12-04T10:57:31.2355017Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e43b258f943c7149.xml 2025-12-04T10:57:31.2613019Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ed8ce545db3785b0.xml 2025-12-04T10:57:31.2877035Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-51bd71d27c2db4f0.xml 2025-12-04T10:57:31.3114370Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eeb723e5683986dd.xml 2025-12-04T10:57:31.3418713Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7dd0923a385a5b44.xml 2025-12-04T10:57:31.3699749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-875b3394fe6124ff.xml 2025-12-04T10:57:31.4005907Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a01719010801f0eb.xml 2025-12-04T10:57:31.4293721Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-abb38b8b64296782.xml 2025-12-04T10:57:31.4580201Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-35d5d4bfe910714e.xml 2025-12-04T10:57:31.4884101Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-fcdbe5c8d6246957.xml 2025-12-04T10:57:31.5140063Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4f2d32d76cd9ea4c.xml 2025-12-04T10:57:31.5606600Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8d01dd7848e58726.xml 2025-12-04T10:57:31.5894737Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b37ec36150974cdc.xml 2025-12-04T10:57:31.6190636Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a5c97ba7476f9699.xml 2025-12-04T10:57:31.6453560Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9f7bc9881e047dd1.xml 2025-12-04T10:57:31.6812066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0d8492641a4c3af3.xml 2025-12-04T10:57:31.7124116Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a118777d82e8d7e.xml 2025-12-04T10:57:31.7375311Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6f1779e409eaf9fb.xml 2025-12-04T10:57:31.7694808Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a2c564c0db133fb.xml 2025-12-04T10:57:31.7952543Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c4e9ae811cf30c32.xml 2025-12-04T10:57:31.8265633Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a0ffda73db67d0e.xml 2025-12-04T10:57:31.8547188Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b10091684b37c862.xml 2025-12-04T10:57:31.8836260Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-362536b218c78604.xml 2025-12-04T10:57:31.9138572Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e2a2b6d5dc912ba1.xml 2025-12-04T10:57:31.9431382Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2bfa612f1908806e.xml 2025-12-04T10:57:31.9715140Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c241632c1bd2254.xml 2025-12-04T10:57:31.9983056Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-300d15ebe169a67d.xml 2025-12-04T10:57:32.0255956Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2664154f3bddb6ff.xml 2025-12-04T10:57:32.0553565Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b262143f686a88dd.xml 2025-12-04T10:57:32.0857135Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c004db07f7b0860b.xml 2025-12-04T10:57:32.1180345Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc18c93bde07fa33.xml 2025-12-04T10:57:32.1500347Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d33e44b619f43cc1.xml 2025-12-04T10:57:32.1804811Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c44272ce3d4ac199.xml 2025-12-04T10:57:32.2069718Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ea07358affb5e144.xml 2025-12-04T10:57:32.2366600Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c57c7620876639a.xml 2025-12-04T10:57:32.2636307Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eede0e2726c06cab.xml 2025-12-04T10:57:32.2935109Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a276c210ef7f6689.xml 2025-12-04T10:57:32.3332043Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd59825a029f8f8b.xml 2025-12-04T10:57:32.3653069Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5046dc8bfb623fa3.xml 2025-12-04T10:57:32.3964651Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4878dd0838c676b7.xml 2025-12-04T10:57:32.4255160Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-66566e960af2b7cd.xml 2025-12-04T10:57:32.4524068Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9252bf6025e90d42.xml 2025-12-04T10:57:32.4820716Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5b920f5d1c4972a5.xml 2025-12-04T10:57:32.5116750Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41378464ce08003d.xml 2025-12-04T10:57:32.5415206Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee4c603fd47011fa.xml 2025-12-04T10:57:32.5895778Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9973927e7b530617.xml 2025-12-04T10:57:32.6234208Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-faddb0db331380df.xml 2025-12-04T10:57:32.6523608Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-babf9f26b0f01a05.xml 2025-12-04T10:57:32.6814338Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-682bb4a108ba0cff.xml 2025-12-04T10:57:32.7084664Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d0185f9ec4d4c49f.xml 2025-12-04T10:57:32.7368208Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-011699f09fdd352f.xml 2025-12-04T10:57:32.7677675Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c6b066059948ead.xml 2025-12-04T10:57:32.7940299Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22fab5f0e190ff66.xml 2025-12-04T10:57:32.8235448Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-55702aa5023cfcc5.xml 2025-12-04T10:57:32.8593528Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ccae7814a1c4777f.xml 2025-12-04T10:57:32.8886757Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5bd848f11487517d.xml 2025-12-04T10:57:32.9195127Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-27d68b49187eba1f.xml 2025-12-04T10:57:32.9482195Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cf1bc9411dde71e0.xml 2025-12-04T10:57:32.9803068Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-445a5d7115d23df5.xml 2025-12-04T10:57:33.0060276Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-44a168cde9f7a829.xml 2025-12-04T10:57:33.0354873Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1ba388d3de704172.xml 2025-12-04T10:57:33.0642342Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd986c0befb813c2.xml 2025-12-04T10:57:33.0941243Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4610efe5376dfca1.xml 2025-12-04T10:57:33.1221282Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8b4358fed50c59f1.xml 2025-12-04T10:57:33.1535132Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-526a02721a1ba5da.xml 2025-12-04T10:57:33.1819733Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3c0978e54cc6fc10.xml 2025-12-04T10:57:33.2181329Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bf5a35496e65d5e4.xml 2025-12-04T10:57:33.2579714Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee9c4c3ca48fe737.xml 2025-12-04T10:57:33.2876980Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d5ca791415d7ead2.xml 2025-12-04T10:57:33.3499728Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4b280a14c5b58c7c.xml 2025-12-04T10:57:33.3823938Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9f1e7a55058f0a18.xml 2025-12-04T10:57:33.4124614Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c9d23e4c6bbfd6d1.xml 2025-12-04T10:57:33.4444608Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d04adc5353a474ef.xml 2025-12-04T10:57:33.4854499Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c31ce4d4db4e93a.xml 2025-12-04T10:57:33.5155601Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-714862760bd05954.xml 2025-12-04T10:57:33.5447398Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-16429bc307938d70.xml 2025-12-04T10:57:33.5754687Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-92f77f3d8cd66053.xml 2025-12-04T10:57:33.6064886Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-deed4e34c84ee498.xml 2025-12-04T10:57:33.6379852Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-425b9693fd331423.xml 2025-12-04T10:57:33.6642066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9149f9baa8d84141.xml 2025-12-04T10:57:33.6924244Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78d5cc488c73d225.xml 2025-12-04T10:57:33.7713095Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-017a63f22f7a2e26.xml 2025-12-04T10:57:33.8137953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3e6391f21f8fa7c0.xml 2025-12-04T10:57:33.8415082Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9e8b675076ef3915.xml 2025-12-04T10:57:33.8695498Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b8d64d4666fb6c9d.xml 2025-12-04T10:57:33.9112828Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0dee982caae0bf52.xml 2025-12-04T10:57:33.9413704Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0df7122c519ced4f.xml 2025-12-04T10:57:33.9846206Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2827e400085e914f.xml 2025-12-04T10:57:34.0140725Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7d39e0b557433741.xml 2025-12-04T10:57:34.0484083Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e6c5067f69c5dc42.xml 2025-12-04T10:57:34.0775952Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d40c5c296523fcf4.xml 2025-12-04T10:57:34.1084368Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e19c088745912810.xml 2025-12-04T10:57:34.1395487Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-21b633b88362af20.xml 2025-12-04T10:57:34.1684888Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f1d69885e8023d73.xml 2025-12-04T10:57:34.2092687Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76455ff9fe96f12c.xml 2025-12-04T10:57:34.2408905Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9224f6b7ff8b973c.xml 2025-12-04T10:57:34.2696501Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64019cd840b5ae37.xml 2025-12-04T10:57:34.3005868Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c52c688cda6423d1.xml 2025-12-04T10:57:34.3794331Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-56aae62a7e88ec0a.xml 2025-12-04T10:57:34.4057338Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-126517b1e280f193.xml 2025-12-04T10:57:34.4324262Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2d346d213506e58a.xml 2025-12-04T10:57:34.4619701Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-093f4d1e23acb10f.xml 2025-12-04T10:57:34.4939887Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-810e1605bd5350e8.xml 2025-12-04T10:57:34.5340922Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-43db9cfa18063736.xml 2025-12-04T10:57:34.5678427Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3d256d1cc46d8d8d.xml 2025-12-04T10:57:34.5976015Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a0174602e3f0dc49.xml 2025-12-04T10:57:34.6284476Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d15167d0a9773e6.xml 2025-12-04T10:57:34.6574848Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2a355bd7e8aa2084.xml 2025-12-04T10:57:34.6855574Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d86e179dbef96adf.xml 2025-12-04T10:57:34.7154456Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a6abc3b994eecaab.xml 2025-12-04T10:57:34.7461784Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f8fe4b288348a5e8.xml 2025-12-04T10:57:34.7747668Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e1865fe4cd352327.xml 2025-12-04T10:57:34.8296410Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2d135dba3284d9dd.xml 2025-12-04T10:57:34.8576963Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ce519dd6997621a.xml 2025-12-04T10:57:34.8861667Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d25b88aa16186c5.xml 2025-12-04T10:57:34.9133810Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2b545a8cfb56682b.xml 2025-12-04T10:57:34.9437653Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-96320154d0a3f580.xml 2025-12-04T10:57:34.9891888Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d58d0eb09203fc2c.xml 2025-12-04T10:57:35.0195023Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76e7132ba7ac5de0.xml 2025-12-04T10:57:35.0503185Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a537f0ef8ed460d9.xml 2025-12-04T10:57:35.0802826Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3c40fad651035635.xml 2025-12-04T10:57:35.1083553Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-68c5b031d9a5ae9e.xml 2025-12-04T10:57:35.1396331Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-712b0b28be8414a0.xml 2025-12-04T10:57:35.1881202Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7eca96992921c511.xml 2025-12-04T10:57:35.2175347Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7834531011d91518.xml 2025-12-04T10:57:35.2453888Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-68f03a926c8d2bd9.xml 2025-12-04T10:57:35.2735517Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e49faae68d1ac0d9.xml 2025-12-04T10:57:35.2996286Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc4d026c52898da8.xml 2025-12-04T10:57:35.3375807Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-03eaa4726076d233.xml 2025-12-04T10:57:35.3807979Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d471afa2e27428d.xml 2025-12-04T10:57:35.4062375Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-065a466bb3b41d27.xml 2025-12-04T10:57:35.4346736Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f328e482896672aa.xml 2025-12-04T10:57:35.4645001Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee7ee7e277bba08f.xml 2025-12-04T10:57:35.5020633Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e55ae93852ba5a41.xml 2025-12-04T10:57:35.5335305Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6750ff7d9a08403d.xml 2025-12-04T10:57:35.5622981Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d85fe03caf11b880.xml 2025-12-04T10:57:35.5898084Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f90e1eb29ec7a7eb.xml 2025-12-04T10:57:35.6167227Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c515ad73db9ec0f.xml 2025-12-04T10:57:35.6476625Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-be5d3342961d1397.xml 2025-12-04T10:57:35.6762701Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-81a8ca35b73b2608.xml 2025-12-04T10:57:35.7017759Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6eb3b25e1011068f.xml 2025-12-04T10:57:35.7301924Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-16ab3c0f531a2710.xml 2025-12-04T10:57:35.7575260Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e912af285a88a53.xml 2025-12-04T10:57:35.7844035Z Running distributed/test_distributed_spawn 4/9 ... [2025-12-04 10:57:35.783858][7487.391774661] 2025-12-04T10:57:35.7844801Z Running distributed tests for the test backend with env init_method 2025-12-04T10:57:35.7845311Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:57:35.7847794Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:57:35.784582] 2025-12-04T10:57:39.3630208Z 2025-12-04T10:57:39.3631346Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_cfb55a01555794b3_.log 2025-12-04T10:57:39.3632426Z Running 0 items in this shard: 2025-12-04T10:57:39.3632766Z 2025-12-04T10:57:39.3637986Z Running distributed tests for the test backend with file init_method 2025-12-04T10:57:39.3639753Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:57:39.3643537Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:57:39.364165] 2025-12-04T10:57:42.9325367Z 2025-12-04T10:57:42.9326508Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_5d1f467e5bbdaff2_.log 2025-12-04T10:57:42.9327604Z Running 0 items in this shard: 2025-12-04T10:57:42.9327838Z 2025-12-04T10:57:42.9332760Z Running distributed tests for the mpi backend with env init_method 2025-12-04T10:57:43.0577988Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:57:43.0579750Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:57:43.057584] 2025-12-04T10:57:47.2123115Z 2025-12-04T10:57:47.2124282Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_b5a10ee12046d5b9_.log 2025-12-04T10:57:47.2125386Z Running 0 items in this shard: 2025-12-04T10:57:47.2125737Z Running 0 items in this shard: 2025-12-04T10:57:47.2126073Z Running 0 items in this shard: 2025-12-04T10:57:47.2126282Z 2025-12-04T10:57:47.2128378Z Running distributed tests for the mpi backend with file init_method 2025-12-04T10:57:47.3386971Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:57:47.3390828Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:57:47.338839] 2025-12-04T10:57:51.5222397Z 2025-12-04T10:57:51.5223570Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_de48cc4d8d8e3c13_.log 2025-12-04T10:57:51.5224892Z Running 0 items in this shard: 2025-12-04T10:57:51.5225233Z Running 0 items in this shard: 2025-12-04T10:57:51.5225565Z Running 0 items in this shard: 2025-12-04T10:57:51.5225772Z 2025-12-04T10:57:51.5230203Z Running distributed tests for the nccl backend with env init_method 2025-12-04T10:57:51.5231869Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T10:57:51.5235956Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:57:51.523410] 2025-12-04T11:01:37.4701414Z 2025-12-04T11:01:37.4705336Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_5fb338ab863a3c8f_.log 2025-12-04T11:01:37.4723028Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda 2025-12-04T11:01:37.4741116Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager 2025-12-04T11:01:37.4742723Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input 2025-12-04T11:01:37.4744310Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value 2025-12-04T11:01:37.4745944Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view 2025-12-04T11:01:37.4747289Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup 2025-12-04T11:01:37.4748704Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max 2025-12-04T11:01:37.4750000Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum 2025-12-04T11:01:37.4751251Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max 2025-12-04T11:01:37.4752452Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min 2025-12-04T11:01:37.4753619Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max 2025-12-04T11:01:37.4754741Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product 2025-12-04T11:01:37.4755848Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum 2025-12-04T11:01:37.4756940Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group 2025-12-04T11:01:37.4758176Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda 2025-12-04T11:01:37.4759453Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda 2025-12-04T11:01:37.4760733Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err 2025-12-04T11:01:37.4761876Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast 2025-12-04T11:01:37.4762966Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager 2025-12-04T11:01:37.4764260Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad 2025-12-04T11:01:37.4765530Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph 2025-12-04T11:01:37.4766689Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook 2025-12-04T11:01:37.4767908Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD 2025-12-04T11:01:37.4769104Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg 2025-12-04T11:01:37.4770365Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params 2025-12-04T11:01:37.4771590Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states 2025-12-04T11:01:37.4772798Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable 2025-12-04T11:01:37.4773932Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank 2025-12-04T11:01:37.4775023Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo 2025-12-04T11:01:37.4776209Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product 2025-12-04T11:01:37.4777681Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min 2025-12-04T11:01:37.4778818Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda 2025-12-04T11:01:37.4779446Z 2025-12-04T11:01:37.4779699Z Running distributed tests for the nccl backend with file init_method 2025-12-04T11:01:37.4780216Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:01:37.4781584Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:01:37.472191] 2025-12-04T11:05:23.4625233Z 2025-12-04T11:05:23.4626380Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_024341bf790fe69a_.log 2025-12-04T11:05:23.4644091Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda 2025-12-04T11:05:23.4661858Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager 2025-12-04T11:05:23.4663469Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input 2025-12-04T11:05:23.4665055Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value 2025-12-04T11:05:23.4666626Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view 2025-12-04T11:05:23.4667967Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup 2025-12-04T11:05:23.4669360Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max 2025-12-04T11:05:23.4670659Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum 2025-12-04T11:05:23.4671908Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max 2025-12-04T11:05:23.4673095Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min 2025-12-04T11:05:23.4674343Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max 2025-12-04T11:05:23.4675500Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product 2025-12-04T11:05:23.4676597Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum 2025-12-04T11:05:23.4677697Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group 2025-12-04T11:05:23.4678941Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda 2025-12-04T11:05:23.4680223Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda 2025-12-04T11:05:23.4681428Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err 2025-12-04T11:05:23.4682571Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast 2025-12-04T11:05:23.4683697Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager 2025-12-04T11:05:23.4685004Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad 2025-12-04T11:05:23.4686291Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph 2025-12-04T11:05:23.4687453Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook 2025-12-04T11:05:23.4688670Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD 2025-12-04T11:05:23.4689871Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg 2025-12-04T11:05:23.4691145Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params 2025-12-04T11:05:23.4692382Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states 2025-12-04T11:05:23.4693579Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable 2025-12-04T11:05:23.4694724Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank 2025-12-04T11:05:23.4695828Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo 2025-12-04T11:05:23.4697294Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product 2025-12-04T11:05:23.4698485Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min 2025-12-04T11:05:23.4699635Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda 2025-12-04T11:05:23.4700279Z 2025-12-04T11:05:23.4700531Z Running distributed tests for the gloo backend with env init_method 2025-12-04T11:05:23.4701050Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:05:23.4702404Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:05:23.464541] 2025-12-04T11:09:43.3178282Z 2025-12-04T11:09:43.3179725Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_807ef3b254ee9578_.log 2025-12-04T11:09:43.3197233Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda 2025-12-04T11:09:43.3214392Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager 2025-12-04T11:09:43.3215958Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input 2025-12-04T11:09:43.3217838Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value 2025-12-04T11:09:43.3219568Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view 2025-12-04T11:09:43.3221129Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup 2025-12-04T11:09:43.3222429Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max 2025-12-04T11:09:43.3223780Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum 2025-12-04T11:09:43.3225070Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max 2025-12-04T11:09:43.3226308Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min 2025-12-04T11:09:43.3227503Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max 2025-12-04T11:09:43.3228680Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product 2025-12-04T11:09:43.3229895Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum 2025-12-04T11:09:43.3231028Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group 2025-12-04T11:09:43.3232300Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda 2025-12-04T11:09:43.3233705Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda 2025-12-04T11:09:43.3234921Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err 2025-12-04T11:09:43.3236072Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast 2025-12-04T11:09:43.3237198Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager 2025-12-04T11:09:43.3238501Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad 2025-12-04T11:09:43.3239786Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph 2025-12-04T11:09:43.3240950Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook 2025-12-04T11:09:43.3242151Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD 2025-12-04T11:09:43.3243352Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg 2025-12-04T11:09:43.3244584Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params 2025-12-04T11:09:43.3245821Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states 2025-12-04T11:09:43.3247013Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable 2025-12-04T11:09:43.3248150Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank 2025-12-04T11:09:43.3249262Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo 2025-12-04T11:09:43.3250546Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product 2025-12-04T11:09:43.3251711Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min 2025-12-04T11:09:43.3252813Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda 2025-12-04T11:09:43.3253443Z 2025-12-04T11:09:43.3253691Z Running distributed tests for the gloo backend with file init_method 2025-12-04T11:09:43.3254196Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:09:43.3255525Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=4', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:09:43.319508] 2025-12-04T11:14:03.4124625Z 2025-12-04T11:14:03.4125764Z distributed/test_distributed_spawn 4/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_4.9_a98bc48b8a2bbb0a_.log 2025-12-04T11:14:03.4143800Z Running 31 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda 2025-12-04T11:14:03.4161359Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_1_level_hierarchical_model_averager_equivalent_to_periodic_model_averager 2025-12-04T11:14:03.4162924Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_2D_Input 2025-12-04T11:14:03.4164465Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_Diff_Input_Sizes_Running_Value 2025-12-04T11:14:03.4165991Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_grad_is_view 2025-12-04T11:14:03.4167294Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_object_subgroup 2025-12-04T11:14:03.4168614Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_max 2025-12-04T11:14:03.4169911Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_sum 2025-12-04T11:14:03.4171158Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_max 2025-12-04T11:14:03.4172359Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_min 2025-12-04T11:14:03.4173552Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_group_max 2025-12-04T11:14:03.4174694Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_product 2025-12-04T11:14:03.4175823Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_sum 2025-12-04T11:14:03.4177186Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_group 2025-12-04T11:14:03.4178465Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_unequal_split_group_cuda 2025-12-04T11:14:03.4179791Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_full_group_cuda 2025-12-04T11:14:03.4181034Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_op_list_err 2025-12-04T11:14:03.4182222Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast 2025-12-04T11:14:03.4183349Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager 2025-12-04T11:14:03.4184685Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_build_debug_param_to_name_mapping_requires_grad 2025-12-04T11:14:03.4186001Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_create_graph 2025-12-04T11:14:03.4187194Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_forward_backward_hook 2025-12-04T11:14:03.4188580Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_hook_pickling_powerSGD 2025-12-04T11:14:03.4189921Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_ignore_params_arg 2025-12-04T11:14:03.4191107Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_shared_grad_acc_unused_params 2025-12-04T11:14:03.4192302Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_module_states 2025-12-04T11:14:03.4193475Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_input_join_disable 2025-12-04T11:14:03.4194582Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_rank 2025-12-04T11:14:03.4195637Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_gloo 2025-12-04T11:14:03.4196790Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_full_group_product 2025-12-04T11:14:03.4197913Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_group_min 2025-12-04T11:14:03.4199020Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum_cuda 2025-12-04T11:14:03.4199610Z 2025-12-04T11:14:03.4200093Z Finished distributed/test_distributed_spawn 4/9 ... [2025-12-04 11:14:03.413916][8475.021831248], took 16.46min 2025-12-04T11:14:03.4412464Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-422b22169e3a08f1.xml 2025-12-04T11:14:03.5241724Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ec15082b412f697.xml 2025-12-04T11:14:03.5512497Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a2eda26248d83b8e.xml 2025-12-04T11:14:03.5755671Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-72f602b330e606cb.xml 2025-12-04T11:14:03.6024232Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-94537227bc12f698.xml 2025-12-04T11:14:03.6290074Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f7368dd24235350f.xml 2025-12-04T11:14:03.6596516Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f5a9742e1242440.xml 2025-12-04T11:14:03.6917072Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6b0873e59b83bf9a.xml 2025-12-04T11:14:03.7286908Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64bbf1c836e72a15.xml 2025-12-04T11:14:03.7625371Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f83300f2b97b0a07.xml 2025-12-04T11:14:03.8368399Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-46e1a3ccabb4ea53.xml 2025-12-04T11:14:03.8700116Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52cd579e7fe5892c.xml 2025-12-04T11:14:03.9031137Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cb876d9d148638c4.xml 2025-12-04T11:14:03.9321492Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-419043608d870248.xml 2025-12-04T11:14:03.9627186Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-03caaef3ff0396d9.xml 2025-12-04T11:14:03.9937664Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a49158b49188737a.xml 2025-12-04T11:14:04.0469642Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9371e4128a3ac8fe.xml 2025-12-04T11:14:04.0859312Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bf7e7c630fc800f5.xml 2025-12-04T11:14:04.1217466Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f263367a9b8ff205.xml 2025-12-04T11:14:04.1538723Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9da5cc1abf82fc88.xml 2025-12-04T11:14:04.1925121Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-17270d7c5dcce82d.xml 2025-12-04T11:14:04.2289882Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52a8a0406f3c10fb.xml 2025-12-04T11:14:04.2568105Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8955835fa53fe405.xml 2025-12-04T11:14:04.2857537Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41e8000da4470974.xml 2025-12-04T11:14:04.3577806Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-17b82ffe3c62718d.xml 2025-12-04T11:14:04.3946937Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-550a077945687423.xml 2025-12-04T11:14:04.4337658Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-97658b25492d180c.xml 2025-12-04T11:14:04.4739132Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5ba6b434230b8a31.xml 2025-12-04T11:14:04.5127826Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ab85cfcce385bb9.xml 2025-12-04T11:14:04.5499042Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-205c67b3e9ea2006.xml 2025-12-04T11:14:04.6179301Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a7727ff60499e455.xml 2025-12-04T11:14:04.7318032Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5545774781103441.xml 2025-12-04T11:14:04.7707780Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-69b99129eec5d274.xml 2025-12-04T11:14:05.0124581Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-71229775f4c708c6.xml 2025-12-04T11:14:05.0498716Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ef94932e8a93743e.xml 2025-12-04T11:14:05.0809207Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-830e1894dcf5c994.xml 2025-12-04T11:14:05.1099357Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d6ec9fe8576de151.xml 2025-12-04T11:14:05.1619870Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dd5c3fba431f03e3.xml 2025-12-04T11:14:05.1904994Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23246ae737e62ded.xml 2025-12-04T11:14:05.2226548Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8aa7ae0f58f2813b.xml 2025-12-04T11:14:05.2559649Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cd7e251b7cd67b87.xml 2025-12-04T11:14:05.2924541Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3ffef4b2a54e0ec6.xml 2025-12-04T11:14:05.3272367Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f47719c8fab0f3fd.xml 2025-12-04T11:14:05.3631861Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7f97df23e3af62b7.xml 2025-12-04T11:14:05.3961006Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7d9b569377c5e6b5.xml 2025-12-04T11:14:05.4299075Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e79d7fc843c87404.xml 2025-12-04T11:14:05.4979085Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0b4908c887012bf3.xml 2025-12-04T11:14:05.5290087Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-15d9380e1c9a62c7.xml 2025-12-04T11:14:05.5597070Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-89d48b8548171ec2.xml 2025-12-04T11:14:05.5933577Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e87d273ae3e5c7f4.xml 2025-12-04T11:14:05.6224684Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5becb9fcc2b2a740.xml 2025-12-04T11:14:05.6658882Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e50500c3a0076f9a.xml 2025-12-04T11:14:05.7009910Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c28f45efdfac39c4.xml 2025-12-04T11:14:05.7378961Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d9fcea5b98362b6a.xml 2025-12-04T11:14:05.7703269Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23763de39322c899.xml 2025-12-04T11:14:05.8057504Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f7a5837d4cf564eb.xml 2025-12-04T11:14:05.8390073Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f6098aefa2030078.xml 2025-12-04T11:14:06.0779837Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d3b389690949ffc.xml 2025-12-04T11:14:06.1125932Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-00c0b12dc56300ed.xml 2025-12-04T11:14:06.3550489Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-875462dd555a5412.xml 2025-12-04T11:14:06.3842497Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5da26e78fc052180.xml 2025-12-04T11:14:06.4159366Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-705b7a3606470644.xml 2025-12-04T11:14:06.4831303Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3996750239d4977f.xml 2025-12-04T11:14:06.5157593Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b1bfbeb9b34c8574.xml 2025-12-04T11:14:06.5488068Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6c5cc720d34bebc6.xml 2025-12-04T11:14:06.5802502Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b5eb76bc9735e309.xml 2025-12-04T11:14:06.6119264Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a28d2b8c4bb8b97.xml 2025-12-04T11:14:06.6478311Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2fa0ff1a8410ed4.xml 2025-12-04T11:14:06.6820508Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a694586bb28814d4.xml 2025-12-04T11:14:06.7156653Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-91f11f0cc30a0889.xml 2025-12-04T11:14:06.7550159Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc882534d0c7ac9e.xml 2025-12-04T11:14:06.7898884Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3576431fa0a79154.xml 2025-12-04T11:14:06.8195691Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-85e1893ad67dccf3.xml 2025-12-04T11:14:06.8517498Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-148510b891c749c6.xml 2025-12-04T11:14:06.8809813Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e6549972a7efaf11.xml 2025-12-04T11:14:07.1196057Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ea6ea860d10e295.xml 2025-12-04T11:14:07.1498662Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-83ab4f7124e50996.xml 2025-12-04T11:14:07.1818879Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a6c1a924e8712f89.xml 2025-12-04T11:14:07.2127352Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0bec6d0d6dd273b2.xml 2025-12-04T11:14:07.2456786Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ce5c2131a079a118.xml 2025-12-04T11:14:07.2778047Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9aa0d7a04a1b05f2.xml 2025-12-04T11:14:07.3088288Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-85e0e890e418ce3a.xml 2025-12-04T11:14:07.3431351Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4cffe073269e4f0a.xml 2025-12-04T11:14:07.3761953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-fb78beccd38dd26e.xml 2025-12-04T11:14:07.4081509Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c24763a200436369.xml 2025-12-04T11:14:07.4474987Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-95f84fd6ea33eee0.xml 2025-12-04T11:14:07.4769950Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-88fe6d3cec93de32.xml 2025-12-04T11:14:07.5088674Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0260bf01f397061e.xml 2025-12-04T11:14:07.5378889Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc07ca8676eed412.xml 2025-12-04T11:14:07.5757749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c73c9ddbbd799146.xml 2025-12-04T11:14:07.6068441Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d73e4a124891508d.xml 2025-12-04T11:14:07.6457111Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e44eef95a4d81dc3.xml 2025-12-04T11:14:07.6787467Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78d0f5373874b1c4.xml 2025-12-04T11:14:07.7070243Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c88483e90b04648.xml 2025-12-04T11:14:07.7365479Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ccf199cbc8b611ab.xml 2025-12-04T11:14:07.7717624Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6a4daccc9da30cdb.xml 2025-12-04T11:14:07.8011002Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d983aecef8c58dfb.xml 2025-12-04T11:14:07.8269054Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-746325984b31e17e.xml 2025-12-04T11:14:07.8618201Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0b8591cc84ef2a6a.xml 2025-12-04T11:14:07.9044831Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-043dda7312ce02a9.xml 2025-12-04T11:14:07.9578679Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3cf2335721c75edb.xml 2025-12-04T11:14:08.0869922Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ed68ee99b507df29.xml 2025-12-04T11:14:08.1230571Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-afe3aa9ea643db5b.xml 2025-12-04T11:14:08.2219034Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-706ef1f553cb8cca.xml 2025-12-04T11:14:08.2578117Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a98124b8f8d7b3ef.xml 2025-12-04T11:14:08.2968321Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee37bb64a8e84ec5.xml 2025-12-04T11:14:08.3850291Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e2af230e2fec6d35.xml 2025-12-04T11:14:08.4269051Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3008545966a2ad5b.xml 2025-12-04T11:14:08.4626237Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-53870facd803211b.xml 2025-12-04T11:14:08.4979816Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4eca7697caf90c2a.xml 2025-12-04T11:14:08.5432749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c4554d604268fb5.xml 2025-12-04T11:14:08.5807953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c6b52be0b4531e90.xml 2025-12-04T11:14:08.6187791Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c63a3f0987273dba.xml 2025-12-04T11:14:08.6596825Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b58af3771e34dd96.xml 2025-12-04T11:14:08.7018682Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-587b09149e6cc83f.xml 2025-12-04T11:14:08.7488590Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e3786dc33e6abd50.xml 2025-12-04T11:14:08.7950522Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dfce7e92d72e48a2.xml 2025-12-04T11:14:08.8769233Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-627617d506ff1d2f.xml 2025-12-04T11:14:08.9133698Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64530dfd24199eb7.xml 2025-12-04T11:14:08.9459024Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ddc33c5ddc10dde.xml 2025-12-04T11:14:08.9749561Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d0632db0896072cf.xml 2025-12-04T11:14:09.0137244Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-edeb0bbc0394ec67.xml 2025-12-04T11:14:09.0539116Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e515d47fe2e6fb9c.xml 2025-12-04T11:14:09.0928735Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c4f0278f004bb5c.xml 2025-12-04T11:14:09.1288986Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c0d3bae257da8444.xml 2025-12-04T11:14:09.1792631Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7025af433f00efbb.xml 2025-12-04T11:14:09.2098757Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-49fd198402d5c655.xml 2025-12-04T11:14:09.2459603Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5277c0b0a803851c.xml 2025-12-04T11:14:09.2858962Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3d4c61b2ce73c677.xml 2025-12-04T11:14:09.3225643Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cb0710cc3c031aa2.xml 2025-12-04T11:14:09.8424042Z Uploading artifacts took 0.49 seconds 2025-12-04T11:14:09.8431540Z Running distributed/test_distributed_spawn 7/9 ... [2025-12-04 11:14:09.842757][8481.450673212] 2025-12-04T11:14:09.8432253Z Running distributed tests for the test backend with env init_method 2025-12-04T11:14:09.8433083Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:14:09.8436922Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:14:09.843494] 2025-12-04T11:14:13.4195652Z 2025-12-04T11:14:13.4197032Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_e6318e4f5e3f044b_.log 2025-12-04T11:14:13.4198182Z Running 0 items in this shard: 2025-12-04T11:14:13.4198407Z 2025-12-04T11:14:13.4199734Z Running distributed tests for the test backend with file init_method 2025-12-04T11:14:13.4201378Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:14:13.4205121Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:14:13.420322] 2025-12-04T11:14:16.9927627Z 2025-12-04T11:14:16.9928797Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_7d14db48d459fad6_.log 2025-12-04T11:14:16.9929868Z Running 0 items in this shard: 2025-12-04T11:14:16.9930083Z 2025-12-04T11:14:16.9935457Z Running distributed tests for the mpi backend with env init_method 2025-12-04T11:14:17.1213488Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:14:17.1215617Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:14:17.121189] 2025-12-04T11:14:21.2925923Z 2025-12-04T11:14:21.2927042Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_867e6ca715844bef_.log 2025-12-04T11:14:21.2928370Z Running 0 items in this shard: 2025-12-04T11:14:21.2928780Z Running 0 items in this shard: 2025-12-04T11:14:21.2929124Z Running 0 items in this shard: 2025-12-04T11:14:21.2929350Z 2025-12-04T11:14:21.2934207Z Running distributed tests for the mpi backend with file init_method 2025-12-04T11:14:21.4214525Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:14:21.4216169Z Executing ['mpiexec', '-n', '3', '--noprefix', '--allow-run-as-root', '/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:14:21.421275] 2025-12-04T11:14:25.6263743Z 2025-12-04T11:14:25.6264872Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_e3e9b753abf00510_.log 2025-12-04T11:14:25.6265938Z Running 0 items in this shard: 2025-12-04T11:14:25.6266297Z Running 0 items in this shard: 2025-12-04T11:14:25.6266636Z Running 0 items in this shard: 2025-12-04T11:14:25.6266857Z 2025-12-04T11:14:25.6271115Z Running distributed tests for the nccl backend with env init_method 2025-12-04T11:14:25.6272844Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:14:25.6276842Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:14:25.627494] 2025-12-04T11:18:40.5506209Z 2025-12-04T11:18:40.5507479Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_57c28f64236fb5f7_.log 2025-12-04T11:18:40.5528307Z Running 33 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters 2025-12-04T11:18:40.5548789Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class 2025-12-04T11:18:40.5550083Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine 2025-12-04T11:18:40.5551539Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad 2025-12-04T11:18:40.5552907Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook 2025-12-04T11:18:40.5554155Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda 2025-12-04T11:18:40.5555365Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min 2025-12-04T11:18:40.5556725Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product 2025-12-04T11:18:40.5558055Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group 2025-12-04T11:18:40.5559247Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda 2025-12-04T11:18:40.5560548Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda 2025-12-04T11:18:40.5561747Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier 2025-12-04T11:18:40.5562801Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda 2025-12-04T11:18:40.5563916Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group 2025-12-04T11:18:40.5565148Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err 2025-12-04T11:18:40.5566363Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list 2025-12-04T11:18:40.5567532Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async 2025-12-04T11:18:40.5568877Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger 2025-12-04T11:18:40.5570300Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged 2025-12-04T11:18:40.5571837Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn 2025-12-04T11:18:40.5573042Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group 2025-12-04T11:18:40.5574170Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params 2025-12-04T11:18:40.5575290Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future 2025-12-04T11:18:40.5576459Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph 2025-12-04T11:18:40.5577818Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler 2025-12-04T11:18:40.5579078Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order 2025-12-04T11:18:40.5580411Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce 2025-12-04T11:18:40.5581578Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum 2025-12-04T11:18:40.5582653Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda 2025-12-04T11:18:40.5583862Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler 2025-12-04T11:18:40.5585168Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler 2025-12-04T11:18:40.5586421Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler 2025-12-04T11:18:40.5587730Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler 2025-12-04T11:18:40.5589250Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters 2025-12-04T11:18:40.5589950Z 2025-12-04T11:18:40.5590190Z Running distributed tests for the nccl backend with file init_method 2025-12-04T11:18:40.5590675Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:18:40.5591963Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:18:40.552460] 2025-12-04T11:22:55.3092749Z 2025-12-04T11:22:55.3094091Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_e15417bf2d6aa02d_.log 2025-12-04T11:22:55.3113218Z Running 33 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters 2025-12-04T11:22:55.3131777Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class 2025-12-04T11:22:55.3133200Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine 2025-12-04T11:22:55.3134591Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad 2025-12-04T11:22:55.3135974Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook 2025-12-04T11:22:55.3137489Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda 2025-12-04T11:22:55.3138738Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min 2025-12-04T11:22:55.3140091Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product 2025-12-04T11:22:55.3141385Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group 2025-12-04T11:22:55.3144159Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda 2025-12-04T11:22:55.3145515Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda 2025-12-04T11:22:55.3146792Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier 2025-12-04T11:22:55.3147858Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda 2025-12-04T11:22:55.3149178Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group 2025-12-04T11:22:55.3150378Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err 2025-12-04T11:22:55.3151578Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list 2025-12-04T11:22:55.3152707Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async 2025-12-04T11:22:55.3154079Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger 2025-12-04T11:22:55.3155402Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged 2025-12-04T11:22:55.3156637Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn 2025-12-04T11:22:55.3157806Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group 2025-12-04T11:22:55.3158918Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params 2025-12-04T11:22:55.3160020Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future 2025-12-04T11:22:55.3161139Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph 2025-12-04T11:22:55.3162259Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler 2025-12-04T11:22:55.3163449Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order 2025-12-04T11:22:55.3164660Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce 2025-12-04T11:22:55.3165765Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum 2025-12-04T11:22:55.3166859Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda 2025-12-04T11:22:55.3168132Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler 2025-12-04T11:22:55.3169359Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler 2025-12-04T11:22:55.3170541Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler 2025-12-04T11:22:55.3171757Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler 2025-12-04T11:22:55.3173023Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters 2025-12-04T11:22:55.3173728Z 2025-12-04T11:22:55.3173995Z Running distributed tests for the gloo backend with env init_method 2025-12-04T11:22:55.3174507Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:22:55.3175803Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:22:55.310735] 2025-12-04T11:26:59.6765996Z 2025-12-04T11:26:59.6767003Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_7faf7d03bb4df9a2_.log 2025-12-04T11:26:59.6785758Z Running 33 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters 2025-12-04T11:26:59.6804022Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class 2025-12-04T11:26:59.6805328Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine 2025-12-04T11:26:59.6806728Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad 2025-12-04T11:26:59.6808086Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook 2025-12-04T11:26:59.6809332Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda 2025-12-04T11:26:59.6810551Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min 2025-12-04T11:26:59.6811884Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product 2025-12-04T11:26:59.6813170Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group 2025-12-04T11:26:59.6814359Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda 2025-12-04T11:26:59.6815664Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda 2025-12-04T11:26:59.6817137Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier 2025-12-04T11:26:59.6818208Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda 2025-12-04T11:26:59.6819370Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group 2025-12-04T11:26:59.6820690Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err 2025-12-04T11:26:59.6822188Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list 2025-12-04T11:26:59.6823391Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async 2025-12-04T11:26:59.6824798Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger 2025-12-04T11:26:59.6826191Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged 2025-12-04T11:26:59.6827517Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn 2025-12-04T11:26:59.6828759Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group 2025-12-04T11:26:59.6829927Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params 2025-12-04T11:26:59.6831088Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future 2025-12-04T11:26:59.6832225Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph 2025-12-04T11:26:59.6833492Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler 2025-12-04T11:26:59.6834792Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order 2025-12-04T11:26:59.6836080Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce 2025-12-04T11:26:59.6837219Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum 2025-12-04T11:26:59.6838279Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda 2025-12-04T11:26:59.6839438Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler 2025-12-04T11:26:59.6840695Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler 2025-12-04T11:26:59.6841912Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler 2025-12-04T11:26:59.6843162Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler 2025-12-04T11:26:59.6844454Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters 2025-12-04T11:26:59.6845223Z 2025-12-04T11:26:59.6845471Z Running distributed tests for the gloo backend with file init_method 2025-12-04T11:26:59.6845976Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:26:59.6847300Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_distributed_spawn.py', '--shard-id=7', '--num-shards=9', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:26:59.678112] 2025-12-04T11:31:03.8198200Z 2025-12-04T11:31:03.8199421Z distributed/test_distributed_spawn 7/9 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_distributed_spawn_7.9_99251297b874e698_.log 2025-12-04T11:31:03.8218174Z Running 33 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler, test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters 2025-12-04T11:31:03.8236775Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_Backend_enum_class 2025-12-04T11:31:03.8238035Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_SyncBatchNorm_No_Affine 2025-12-04T11:31:03.8239395Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_DistributedDataParallel_requires_grad 2025-12-04T11:31:03.8240756Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_accumulate_gradients_no_sync_allreduce_hook 2025-12-04T11:31:03.8242015Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_gather_cuda 2025-12-04T11:31:03.8243192Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_min 2025-12-04T11:31:03.8244486Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_reduce_coalesced_full_group_product 2025-12-04T11:31:03.8245703Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group 2025-12-04T11:31:03.8246857Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_full_group_cuda 2025-12-04T11:31:03.8248122Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_all_to_all_single_equal_split_full_group_cuda 2025-12-04T11:31:03.8249296Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier 2025-12-04T11:31:03.8250315Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_cuda 2025-12-04T11:31:03.8251393Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_barrier_timeout_group 2025-12-04T11:31:03.8252575Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_batch_isend_irecv_mixed_backend_err 2025-12-04T11:31:03.8253777Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list 2025-12-04T11:31:03.8254914Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_coalescing_manager_async 2025-12-04T11:31:03.8256315Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_compute_bucket_assignment_by_size_sparse_error_with_logger 2025-12-04T11:31:03.8257944Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_python_error_logged 2025-12-04T11:31:03.8259262Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_uneven_inputs_stop_iteration_sync_bn 2025-12-04T11:31:03.8260511Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_gather_group 2025-12-04T11:31:03.8261674Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_data_parallel_params 2025-12-04T11:31:03.8262814Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_get_future 2025-12-04T11:31:03.8263952Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_invalid_static_graph 2025-12-04T11:31:03.8265152Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_isend_autograd_profiler 2025-12-04T11:31:03.8266476Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_monitored_barrier_failure_order 2025-12-04T11:31:03.8267750Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_nccl_backend_bool_allreduce 2025-12-04T11:31:03.8269026Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_reduce_sum 2025-12-04T11:31:03.8270156Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_scatter_cuda 2025-12-04T11:31:03.8271295Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_autograd_profiler 2025-12-04T11:31:03.8272508Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_nccl_torch_profiler 2025-12-04T11:31:03.8273720Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_torch_profiler 2025-12-04T11:31:03.8274932Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_send_recv_with_tag_autograd_profiler 2025-12-04T11:31:03.8276201Z Running 1 items in this shard: test/distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_skip_all_reduce_unused_parameters 2025-12-04T11:31:03.8276888Z 2025-12-04T11:31:03.8277284Z Finished distributed/test_distributed_spawn 7/9 ... [2025-12-04 11:31:03.820755][9495.428670826], took 16.90min 2025-12-04T11:31:03.8489666Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e12df5e946a2399b.xml 2025-12-04T11:31:03.9270369Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4ab25792bd6780ce.xml 2025-12-04T11:31:03.9577253Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee61fca4ae363844.xml 2025-12-04T11:31:03.9859236Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-12e19ecac0707a9f.xml 2025-12-04T11:31:04.0212281Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-49aeb17bc0069227.xml 2025-12-04T11:31:04.0517084Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-82678a9127d50625.xml 2025-12-04T11:31:04.0895612Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ac8ca9bd1994ece.xml 2025-12-04T11:31:04.1280894Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3403d5bb8935cb4e.xml 2025-12-04T11:31:04.1752398Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b0c166deb400ad9d.xml 2025-12-04T11:31:04.2168656Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-60e4e17b51df739f.xml 2025-12-04T11:31:04.2537150Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22eb7410be2437d9.xml 2025-12-04T11:31:04.2918769Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9ee70791b9debd6c.xml 2025-12-04T11:31:04.3259657Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-81abecf194df2c45.xml 2025-12-04T11:31:04.3570157Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1136154023961765.xml 2025-12-04T11:31:04.3929906Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cfef205e8493de16.xml 2025-12-04T11:31:04.4236731Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd599f355b8caaeb.xml 2025-12-04T11:31:04.4556543Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-62ca7bd8b65dea10.xml 2025-12-04T11:31:04.4924448Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b3d3e55cfe315fc5.xml 2025-12-04T11:31:04.5404685Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a45eb631d6c35ef.xml 2025-12-04T11:31:04.6359301Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aae6fb78854ea6ff.xml 2025-12-04T11:31:04.6789932Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9eef2c9b45729eeb.xml 2025-12-04T11:31:04.7295457Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d106ae3bbe7d9e5c.xml 2025-12-04T11:31:04.7858262Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ff643138d43dd85.xml 2025-12-04T11:31:04.8257135Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c72d0c28afc7b8b.xml 2025-12-04T11:31:04.8708287Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8cb6ed13882ace9d.xml 2025-12-04T11:31:04.9107730Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-51d5ea88c29b6ed7.xml 2025-12-04T11:31:04.9627487Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0e2af92baadfb43c.xml 2025-12-04T11:31:05.0037777Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ee64e4888310471.xml 2025-12-04T11:31:05.0349840Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2124f6a7f1f8a6ad.xml 2025-12-04T11:31:05.0758861Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a72595ddb271e95.xml 2025-12-04T11:31:05.1103550Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f5a0fd7e9efb76d5.xml 2025-12-04T11:31:05.1509690Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f05ec777ac110fb6.xml 2025-12-04T11:31:05.2032418Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c4dbe227aaf8cd2.xml 2025-12-04T11:31:05.2355227Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d8d80edc2b8c69e.xml 2025-12-04T11:31:05.2641231Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-50add8f3174dd7ac.xml 2025-12-04T11:31:05.3040893Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-851cdc069dcc69f7.xml 2025-12-04T11:31:05.3450959Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1acd79e907003b41.xml 2025-12-04T11:31:05.3870301Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a0ff1f71f9283f58.xml 2025-12-04T11:31:05.4189200Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-65237f33092a4b4f.xml 2025-12-04T11:31:05.4614429Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-42750e8459e7d15b.xml 2025-12-04T11:31:05.5089216Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d44ddde7846d301e.xml 2025-12-04T11:31:05.5605305Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d84034c24f131de9.xml 2025-12-04T11:31:05.6047805Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b21382e4a0d075d7.xml 2025-12-04T11:31:05.6619972Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f01856e9a2028bff.xml 2025-12-04T11:31:05.7037542Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d271f82508cdd35e.xml 2025-12-04T11:31:05.7361087Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-602ab3c67d585e00.xml 2025-12-04T11:31:05.7669474Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6c4b4f500cbe46b2.xml 2025-12-04T11:31:05.8037747Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-060bfe393d18a7b7.xml 2025-12-04T11:31:05.8389940Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-08a6cb454dfb3288.xml 2025-12-04T11:31:05.8739291Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-14f8591ab0b18d47.xml 2025-12-04T11:31:05.9071670Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-faf65bc8adad7023.xml 2025-12-04T11:31:05.9390802Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7ab921a38daba1bb.xml 2025-12-04T11:31:05.9748673Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-205a17c445d16b08.xml 2025-12-04T11:31:06.0132299Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-14314f5e6064defd.xml 2025-12-04T11:31:06.0598334Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9a98077fc0a28449.xml 2025-12-04T11:31:06.0967325Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3e2de3e4d8afa5ff.xml 2025-12-04T11:31:06.1325083Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-512586046bd1af6f.xml 2025-12-04T11:31:06.1709561Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1fa69b7512f74eae.xml 2025-12-04T11:31:06.2368980Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-70138f82b180a3f5.xml 2025-12-04T11:31:06.2759250Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b7ed61d0627f9533.xml 2025-12-04T11:31:06.3111917Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-493e10e45797f8fa.xml 2025-12-04T11:31:06.3402046Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-87c65811f60e5e0f.xml 2025-12-04T11:31:06.3769278Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-635f35dfbbc33c85.xml 2025-12-04T11:31:06.4107364Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-355930f4da4ab18f.xml 2025-12-04T11:31:06.4432340Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f6333fa7d0fe5c91.xml 2025-12-04T11:31:06.4741528Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3076e5b00c0eef07.xml 2025-12-04T11:31:06.5139403Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9141798051401a79.xml 2025-12-04T11:31:06.5497179Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d96c5808f2f4d423.xml 2025-12-04T11:31:06.5886573Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-59eca95b80bf15e4.xml 2025-12-04T11:31:06.6231036Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7eeb7f329dcb1625.xml 2025-12-04T11:31:06.6510312Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c438893677b09839.xml 2025-12-04T11:31:06.6801086Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d707ddf229008c6a.xml 2025-12-04T11:31:06.7310143Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c4d97d092b2123a2.xml 2025-12-04T11:31:06.7648634Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1574030634816010.xml 2025-12-04T11:31:06.8125093Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5fa3a6eb60f4eca4.xml 2025-12-04T11:31:06.8432173Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e754e92f5037c52.xml 2025-12-04T11:31:06.9244130Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-020049def8c5b0a9.xml 2025-12-04T11:31:06.9718144Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d4dd04eda8983093.xml 2025-12-04T11:31:07.0099788Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a612b5b9d29cdf4.xml 2025-12-04T11:31:07.0622538Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f0f750f594e5734b.xml 2025-12-04T11:31:07.0941011Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7cb1e30e8a2e57ea.xml 2025-12-04T11:31:07.1297118Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc8052641a24d5dc.xml 2025-12-04T11:31:07.1679192Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d8cbbb1187ec0f64.xml 2025-12-04T11:31:07.2025903Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f83af7e95786df72.xml 2025-12-04T11:31:07.2371894Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a731f1e0a2629b95.xml 2025-12-04T11:31:07.2757585Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3ae47b09c2c50f23.xml 2025-12-04T11:31:07.3097059Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ec880e83b34c8e36.xml 2025-12-04T11:31:07.3490375Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c3833fdae73dbf3c.xml 2025-12-04T11:31:07.3839065Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-86aa7d82374c9e5b.xml 2025-12-04T11:31:07.4148373Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a10e426b5fcbde30.xml 2025-12-04T11:31:07.4486169Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ff35c7e5488dd9ac.xml 2025-12-04T11:31:07.4876497Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-924d345c27601ea8.xml 2025-12-04T11:31:07.5280477Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1681683ab3d327ac.xml 2025-12-04T11:31:07.5655676Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22e9fd6e5aba0f0d.xml 2025-12-04T11:31:07.6025472Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d9dffcfba1bc1e60.xml 2025-12-04T11:31:07.6379089Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1b652ce23cebda63.xml 2025-12-04T11:31:07.6707092Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b5b9a6fa991ecf1c.xml 2025-12-04T11:31:07.6997796Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f3a9e9304d25446.xml 2025-12-04T11:31:07.7291372Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0390eeced956f562.xml 2025-12-04T11:31:07.7567284Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-439532956daa54d1.xml 2025-12-04T11:31:07.7911952Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0f977aa3cd3cecaf.xml 2025-12-04T11:31:07.8600189Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-24127363c11860de.xml 2025-12-04T11:31:07.8922388Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0cd422e8a222e606.xml 2025-12-04T11:31:07.9167805Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-27b9de38969ee6f6.xml 2025-12-04T11:31:07.9429677Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-62abfea4d6932c1e.xml 2025-12-04T11:31:07.9805489Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e4cf4d2497acecc4.xml 2025-12-04T11:31:08.0079301Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b0b71a9d976366a8.xml 2025-12-04T11:31:08.0389271Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8c2b944477a517c5.xml 2025-12-04T11:31:08.0748948Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c7a620380978373.xml 2025-12-04T11:31:08.1071810Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8aaa461eddd2a0f5.xml 2025-12-04T11:31:08.1416446Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d5c5af8107d86770.xml 2025-12-04T11:31:08.1826669Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-629d0d3ddf4c3e06.xml 2025-12-04T11:31:08.2140119Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7350065f0535f01a.xml 2025-12-04T11:31:08.2439693Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-877f842d3f2815af.xml 2025-12-04T11:31:08.2770492Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c391387e4c62daf7.xml 2025-12-04T11:31:08.3070803Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cea6ac435fa81670.xml 2025-12-04T11:31:08.3379545Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-69f0ceb782ba322d.xml 2025-12-04T11:31:08.3667775Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-354a8796ee4ffd32.xml 2025-12-04T11:31:08.3998943Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52a60b9c4e3ec8c5.xml 2025-12-04T11:31:08.4269764Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-576d152cd04ca1c5.xml 2025-12-04T11:31:08.4567696Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5733f17598591d18.xml 2025-12-04T11:31:08.4858623Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8d06b92a9ae7d27c.xml 2025-12-04T11:31:08.5166242Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ebef8e69977ebea2.xml 2025-12-04T11:31:08.5537835Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ea6c158c65373811.xml 2025-12-04T11:31:08.5855568Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2ff679811871b4a.xml 2025-12-04T11:31:08.6158631Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc9e37194800f0d1.xml 2025-12-04T11:31:08.6480297Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5145615a66bd578b.xml 2025-12-04T11:31:08.6759408Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-33b7f705a30ded9f.xml 2025-12-04T11:31:08.7047062Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca496a8780de69f3.xml 2025-12-04T11:31:08.7310873Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8bec3baffba656ff.xml 2025-12-04T11:31:08.7610490Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c836ef383c971ad8.xml 2025-12-04T11:31:08.7912136Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-deb32df1c36c795c.xml 2025-12-04T11:31:08.8208550Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6dabff71918e7b99.xml 2025-12-04T11:31:08.8478557Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca39e437f793eab2.xml 2025-12-04T11:31:08.8808329Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d93f79d5e733c01.xml 2025-12-04T11:31:08.9119692Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2079ea64f821f40e.xml 2025-12-04T11:31:08.9431605Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eb15a6e33c260556.xml 2025-12-04T11:31:08.9759099Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ae1eb5639088ccd8.xml 2025-12-04T11:31:09.0078944Z Running distributed/test_serialization 1/1 ... [2025-12-04 11:31:09.007392][9500.615309829] 2025-12-04T11:31:09.0079664Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:31:09.0080916Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_serialization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:31:09.007733] 2025-12-04T11:31:13.2330819Z 2025-12-04T11:31:13.2331927Z distributed/test_serialization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_serialization_1.1_13a719996bf7ed77_.log 2025-12-04T11:31:13.2337565Z Running 11 items in this shard: test/distributed/test_serialization.py::TestSerialization::test_cuda, test/distributed/test_serialization.py::TestSerialization::test_dtensor, test/distributed/test_serialization.py::TestSerialization::test_empty_tensor, test/distributed/test_serialization.py::TestSerialization::test_nested_tensors, test/distributed/test_serialization.py::TestSerialization::test_python_object, test/distributed/test_serialization.py::TestSerialization::test_scalar_tensor, test/distributed/test_serialization.py::TestSerialization::test_str_utf8, test/distributed/test_serialization.py::TestSerialization::test_strided_tensor, test/distributed/test_serialization.py::TestSerialization::test_tensor_with_offset, test/distributed/test_serialization.py::TestSerialization::test_various_data_types, test/distributed/test_serialization.py::TestSerialization::test_weights_only 2025-12-04T11:31:13.2342400Z 2025-12-04T11:31:13.2342809Z Finished distributed/test_serialization 1/1 ... [2025-12-04 11:31:13.232757][9504.840668469], took 0.07min 2025-12-04T11:31:13.2605521Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_serialization/distributed.test_serialization-5c3790edbaae9c6a.xml 2025-12-04T11:31:13.3366118Z Running distributed/fsdp/test_fsdp_ignored_modules 1/1 ... [2025-12-04 11:31:13.336091][9504.944008631] 2025-12-04T11:31:13.3366791Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:31:13.3368101Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_ignored_modules.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:31:13.336425] 2025-12-04T11:31:57.6129849Z 2025-12-04T11:31:57.6132554Z distributed/fsdp/test_fsdp_ignored_modules 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_ignored_modules_1.1_10f1fa8ebe15ff14_.log 2025-12-04T11:31:57.6138902Z Running 8 items in this shard: test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_diff_ignored_modules_across_ranks, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_modules_invalid, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_modules_nested, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_modules_not_under_wrapped_root_ignore_modules_False, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_modules_not_under_wrapped_root_ignore_modules_True, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_modules_transformer, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_states_auto_wrap, test/distributed/fsdp/test_fsdp_ignored_modules.py::TestFSDPIgnoredModules::test_ignored_states_check 2025-12-04T11:31:57.6143856Z 2025-12-04T11:31:57.6144327Z Finished distributed/fsdp/test_fsdp_ignored_modules 1/1 ... [2025-12-04 11:31:57.612826][9549.220738508], took 0.74min 2025-12-04T11:31:57.6410824Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_ignored_modules/distributed.fsdp.test_fsdp_ignored_modules-c4ab0979e06883a2.xml 2025-12-04T11:31:57.7727427Z Running distributed/_composable/fsdp/test_fully_shard_comm 1/1 ... [2025-12-04 11:31:57.772517][9549.380433795] 2025-12-04T11:31:57.7728141Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:31:57.7730517Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_comm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:31:57.772861] 2025-12-04T11:34:53.7938441Z 2025-12-04T11:34:53.7939786Z distributed/_composable/fsdp/test_fully_shard_comm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_comm_1.1_365cd7de0daee87d_.log 2025-12-04T11:34:53.7966989Z Running 22 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardCollectiveOps::test_all_gather_fp32, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardCollectiveOps::test_reduce_scatter_fp16, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardCollectiveOps::test_reduce_scatter_fp32, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardCommunication::test_fully_shard_communication_count, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardCommunication::test_manual_reshard_with_reshard_after_forward_false, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardCommunication::test_set_reduce_scatter_divide_factor, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardCommunication::test_set_reshard_after_forward, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardPrefetch::test_backward_misprefetch, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardPrefetch::test_fully_shard_backward_prefetch, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardPrefetch::test_fully_shard_multi_module_backward_prefetch, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardPrefetch::test_fully_shard_multi_module_unused_module, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardPrefetch::test_set_modules_to_backward_prefetch, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardPrefetch::test_set_modules_to_backward_prefetch_inside_ac, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardPrefetch::test_set_modules_to_forward_prefetch, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardUnshardMultiProcess::test_unshard_async, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardUnshardMultiThread::test_unshard_no_param_group, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardUnshardMultiThread::test_unshard_without_lazy_init, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardAllocFromPG::test_exception_when_used_together_with_comm_hooks, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardAllocFromPG::test_fully_shard_alloc_from_pg, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardForceSumReduction::test_fully_shard_force_sum_both_reductions, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardForceSumReduction::test_fully_shard_force_sum_reduce_scatter, test/distributed/_composable/fsdp/test_fully_shard_comm.py::TestFullyShardReduceOpWorldSize1::test_size1_reduceop 2025-12-04T11:34:53.8018799Z 2025-12-04T11:34:53.8019412Z Finished distributed/_composable/fsdp/test_fully_shard_comm 1/1 ... [2025-12-04 11:34:53.801549][9725.409461872], took 2.93min 2025-12-04T11:34:53.8294070Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_comm/distributed._composable.fsdp.test_fully_shard_comm-b03b971b17f9f8be.xml 2025-12-04T11:34:54.8729148Z Uploading artifacts took 0.91 seconds 2025-12-04T11:34:54.8735071Z Running distributed/fsdp/test_fsdp_sharded_grad_scaler 1/1 ... [2025-12-04 11:34:54.872927][9726.480842753] 2025-12-04T11:34:54.8735730Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:34:54.8737405Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/fsdp/test_fsdp_sharded_grad_scaler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:34:54.873272] 2025-12-04T11:37:04.4598751Z 2025-12-04T11:37:04.4599997Z distributed/fsdp/test_fsdp_sharded_grad_scaler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.fsdp.test_fsdp_sharded_grad_scaler_1.1_be49dd131ba0d1a6_.log 2025-12-04T11:37:04.4617118Z Running 20 items in this shard: test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardGradScaler::test_grad_scaling, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardGradScaler::test_inf_gradients_skip_optim_step, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardGradScaler::test_scaling_unscaling_sparse, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_none_mixed_precision_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_none_mixed_precision_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_none_none_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_none_none_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_shard_grad_op_mixed_precision_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_shard_grad_op_mixed_precision_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_shard_grad_op_none_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_shard_grad_op_none_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_none_mixed_precision_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_none_mixed_precision_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_none_none_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_none_none_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_shard_grad_op_mixed_precision_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_shard_grad_op_mixed_precision_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_shard_grad_op_none_none, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_true_shard_grad_op_none_use_orig_params, test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_sharded_grad_scaler_found_inf 2025-12-04T11:37:04.4633638Z 2025-12-04T11:37:04.4634090Z Finished distributed/fsdp/test_fsdp_sharded_grad_scaler 1/1 ... [2025-12-04 11:37:04.459611][9856.067525451], took 2.16min 2025-12-04T11:37:04.4883871Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.fsdp.test_fsdp_sharded_grad_scaler/distributed.fsdp.test_fsdp_sharded_grad_scaler-830facc45336217a.xml 2025-12-04T11:37:04.6099175Z Running distributed/_shard/sharding_plan/test_sharding_plan 1/1 ... [2025-12-04 11:37:04.609443][9856.217361009] 2025-12-04T11:37:04.6099901Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:37:04.6101441Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_shard/sharding_plan/test_sharding_plan.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:37:04.609817] 2025-12-04T11:37:26.2317186Z 2025-12-04T11:37:26.2320948Z distributed/_shard/sharding_plan/test_sharding_plan 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._shard.sharding_plan.test_sharding_plan_1.1_abd5760a3cc4b6ac_.log 2025-12-04T11:37:26.2323970Z Running 3 items in this shard: test/distributed/_shard/sharding_plan/test_sharding_plan.py::TestShardingPlan::test_custom_sharding_planner, test/distributed/_shard/sharding_plan/test_sharding_plan.py::TestShardingPlan::test_shard_module_sub_process_group, test/distributed/_shard/sharding_plan/test_sharding_plan.py::TestShardingPlan::test_sharding_plan_errors 2025-12-04T11:37:26.2325804Z 2025-12-04T11:37:26.2326503Z Finished distributed/_shard/sharding_plan/test_sharding_plan 1/1 ... [2025-12-04 11:37:26.231209][9877.839119519], took 0.36min 2025-12-04T11:37:26.2596996Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._shard.sharding_plan.test_sharding_plan/distributed._shard.sharding_plan.test_sharding_plan-86fe0d16a378ac71.xml 2025-12-04T11:37:26.3982459Z Running distributed/_shard/sharded_optim/test_sharded_optim 1/1 ... [2025-12-04 11:37:26.398016][9878.005932853] 2025-12-04T11:37:26.3983183Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:37:26.3985333Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_shard/sharded_optim/test_sharded_optim.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:37:26.398349] 2025-12-04T11:37:42.3032431Z 2025-12-04T11:37:42.3033728Z distributed/_shard/sharded_optim/test_sharded_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._shard.sharded_optim.test_sharded_optim_1.1_eb895e054ba35bc4_.log 2025-12-04T11:37:42.3035995Z Running 2 items in this shard: test/distributed/_shard/sharded_optim/test_sharded_optim.py::TestShardedOptimizer::test_named_params_with_sharded_tensor, test/distributed/_shard/sharded_optim/test_sharded_optim.py::TestShardedOptimizer::test_sharded_optim 2025-12-04T11:37:42.3037277Z 2025-12-04T11:37:42.3037757Z Finished distributed/_shard/sharded_optim/test_sharded_optim 1/1 ... [2025-12-04 11:37:42.302901][9893.910799964], took 0.27min 2025-12-04T11:37:42.3306243Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._shard.sharded_optim.test_sharded_optim/distributed._shard.sharded_optim.test_sharded_optim-a8d576a6cb5a21e5.xml 2025-12-04T11:37:42.4701159Z Running distributed/_composable/fsdp/test_fully_shard_state_dict 1/1 ... [2025-12-04 11:37:42.469882][9894.077799811] 2025-12-04T11:37:42.4701917Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:37:42.4703923Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_state_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:37:42.470208] 2025-12-04T11:38:27.4483600Z 2025-12-04T11:38:27.4487924Z distributed/_composable/fsdp/test_fully_shard_state_dict 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_state_dict_1.1_b527545a7e0cfc76_.log 2025-12-04T11:38:27.4493847Z Running 7 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_state_dict.py::TestFullyShardStateDictMultiProcess::test_2d_state_dict_correctness, test/distributed/_composable/fsdp/test_fully_shard_state_dict.py::TestFullyShardStateDictMultiProcess::test_cached_state_dict, test/distributed/_composable/fsdp/test_fully_shard_state_dict.py::TestFullyShardStateDictMultiProcess::test_dp_state_dict_cpu_offload, test/distributed/_composable/fsdp/test_fully_shard_state_dict.py::TestFullyShardStateDictMultiProcess::test_dp_state_dict_save_load, test/distributed/_composable/fsdp/test_fully_shard_state_dict.py::TestFullyShardStateDictMultiProcess::test_dp_tp_state_dict_save_load, test/distributed/_composable/fsdp/test_fully_shard_state_dict.py::TestFullyShardStateDictMultiProcess::test_hsdp_tp_state_dict_save_load, test/distributed/_composable/fsdp/test_fully_shard_state_dict.py::TestFullyShardStateDictMultiThread::test_rank0_offload_full_state_dict 2025-12-04T11:38:27.4499160Z 2025-12-04T11:38:27.4499685Z Finished distributed/_composable/fsdp/test_fully_shard_state_dict 1/1 ... [2025-12-04 11:38:27.447744][9939.055660093], took 0.75min 2025-12-04T11:38:27.4763732Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_state_dict/distributed._composable.fsdp.test_fully_shard_state_dict-7cd1746803ec2a8b.xml 2025-12-04T11:38:27.6015080Z Running distributed/tensor/test_utils 1/1 ... [2025-12-04 11:38:27.601012][9939.20893012] 2025-12-04T11:38:27.6015678Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:38:27.6017214Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/tensor/test_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:38:27.601353] 2025-12-04T11:40:00.5524253Z 2025-12-04T11:40:00.5525333Z distributed/tensor/test_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.tensor.test_utils_1.1_adf864a1b1c1212f_.log 2025-12-04T11:40:00.5538470Z Running 24 items in this shard: test/distributed/tensor/test_utils.py::LocalTest::test_compute_local_shape_and_global_offset_uneven, test/distributed/tensor/test_utils.py::UtilTest::test_compute_global_tensor_shape_1D, test/distributed/tensor/test_utils.py::UtilTest::test_compute_global_tensor_shape_1D_invalid_shape, test/distributed/tensor/test_utils.py::UtilTest::test_compute_global_tensor_shape_failure_2D, test/distributed/tensor/test_utils.py::UtilTest::test_compute_local_shape_and_global_offset_1D, test/distributed/tensor/test_utils.py::UtilTest::test_compute_local_shape_and_global_offset_2D, test/distributed/tensor/test_utils.py::UtilTest::test_compute_local_shape_and_global_offset_3D, test/distributed/tensor/test_utils.py::UtilTest::test_compute_local_shape_and_global_offset_4D, test/distributed/tensor/test_utils.py::UtilTest::test_fsdp_tp_meta_compute, test/distributed/tensor/test_utils.py::UtilTest::test_hsdp_tp_meta_compute, test/distributed/tensor/test_utils.py::UtilTest::test_uneven_fsdp_tp_meta_compute, test/distributed/tensor/test_utils.py::UtilSingleDeviceTest::test_compute_global_tensor_info_non_shard_placements, test/distributed/tensor/test_utils.py::UtilSingleDeviceTest::test_compute_global_tensor_info_shard_placement, test/distributed/tensor/test_utils.py::UtilSingleDeviceTest::test_compute_global_tensor_info_unsupported_placement, test/distributed/tensor/test_utils.py::UtilSingleDeviceTest::test_compute_tensor_info, test/distributed/tensor/test_utils.py::TestStridedSharding::test_1d_mesh_strided_sharding, test/distributed/tensor/test_utils.py::TestStridedSharding::test_2d_mesh_2d_tensor_strided_sharding, test/distributed/tensor/test_utils.py::TestStridedSharding::test_2d_mesh_strided_sharding, test/distributed/tensor/test_utils.py::TestStridedSharding::test_2d_mesh_uneven_strided_shard, test/distributed/tensor/test_utils.py::Test_StridedShard_with_shard_order::test_StridedShard_not_convertible_to_shard_order, test/distributed/tensor/test_utils.py::Test_StridedShard_with_shard_order::test_StridedShard_to_shard_order, test/distributed/tensor/test_utils.py::Test2DStridedLocalShard::test_fsdp1_tp_2d_dtensor_local_shards_and_offsets, test/distributed/tensor/test_utils.py::Test2DStridedLocalShard::test_fsdp2_tp_2d_dtensor_local_shards_and_offsets, test/distributed/tensor/test_utils.py::TestExplicitRedistribute::test_explicit_matmul 2025-12-04T11:40:00.5550685Z 2025-12-04T11:40:00.5551075Z Finished distributed/tensor/test_utils 1/1 ... [2025-12-04 11:40:00.552363][10032.160276342], took 1.55min 2025-12-04T11:40:00.5814276Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.tensor.test_utils/distributed.tensor.test_utils-ce4dc3e67348c080.xml 2025-12-04T11:40:00.7173345Z Running distributed/_composable/fsdp/test_fully_shard_memory 1/1 ... [2025-12-04 11:40:00.716666][10032.324584443] 2025-12-04T11:40:00.7174085Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:40:00.7175648Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_composable/fsdp/test_fully_shard_memory.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:40:00.717008] 2025-12-04T11:40:15.5191075Z 2025-12-04T11:40:15.5192733Z distributed/_composable/fsdp/test_fully_shard_memory 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._composable.fsdp.test_fully_shard_memory_1.1_49e4cc8ab7bdec96_.log 2025-12-04T11:40:15.5195092Z Running 2 items in this shard: test/distributed/_composable/fsdp/test_fully_shard_memory.py::TestFullyShardMemory::test_fully_shard_del_memory, test/distributed/_composable/fsdp/test_fully_shard_memory.py::TestFullyShardMemory::test_fully_shard_training_memory 2025-12-04T11:40:15.5196432Z 2025-12-04T11:40:15.5196941Z Finished distributed/_composable/fsdp/test_fully_shard_memory 1/1 ... [2025-12-04 11:40:15.518759][10047.126668625], took 0.25min 2025-12-04T11:40:15.5476384Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_memory/distributed._composable.fsdp.test_fully_shard_memory-bd84ca434b9abee9.xml 2025-12-04T11:40:15.6736960Z Running distributed/checkpoint/test_state_dict 1/1 ... [2025-12-04 11:40:15.673182][10047.28109898] 2025-12-04T11:40:15.6737845Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:40:15.6739170Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_state_dict.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:40:15.673525] 2025-12-04T11:43:11.5121375Z 2025-12-04T11:43:11.5122831Z distributed/checkpoint/test_state_dict 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_state_dict_1.1_211422b52eb9ecc9_.log 2025-12-04T11:43:11.5136539Z Running 25 items in this shard: test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_activation_ckpt_fqns_ddp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_activation_ckpt_fqns_fsdp1, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_broadcast_from_rank0, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_compiled_fsdp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_cpu_offload_full_state_dict, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_ddp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_deprecate_api, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_extra_state, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_flattened_osd, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp2, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp_ddp, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_fsdp_root_not_initialized, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_multi_device_load_model_state_dict, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_multi_param_groups, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_non_persistent_buffers, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_optim_state_dict_param_matching, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_set_cpu_model_state_dict_broadcast_from_rank0, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_setting_meta_device_model, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_setting_meta_device_model_broadcasting_and_memory, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_shared_weight, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_single_gpu, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_state_dict_with_hook_on_keys, test/distributed/checkpoint/test_state_dict.py::TestStateDict::test_strict, test/distributed/checkpoint/test_state_dict.py::TestNoComm::test_no_dist 2025-12-04T11:43:11.5148389Z 2025-12-04T11:43:11.5149046Z Finished distributed/checkpoint/test_state_dict 1/1 ... [2025-12-04 11:43:11.514244][10223.122157065], took 2.93min 2025-12-04T11:43:11.5433319Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_state_dict/distributed.checkpoint.test_state_dict-82ab38e24fe889c8.xml 2025-12-04T11:43:11.6687028Z Running distributed/checkpoint/test_state_dict_utils 1/1 ... [2025-12-04 11:43:11.668031][10223.275949038] 2025-12-04T11:43:11.6687721Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:43:11.6689056Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/checkpoint/test_state_dict_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:43:11.668370] 2025-12-04T11:43:56.8999016Z 2025-12-04T11:43:56.9000256Z distributed/checkpoint/test_state_dict_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.checkpoint.test_state_dict_utils_1.1_53a76f3501a79ced_.log 2025-12-04T11:43:56.9005206Z Running 7 items in this shard: test/distributed/checkpoint/test_state_dict_utils.py::TestStateDictUtils::test_complicated_dict, test/distributed/checkpoint/test_state_dict_utils.py::TestStateDictUtils::test_cpu_and_ranks_only, test/distributed/checkpoint/test_state_dict_utils.py::TestStateDictUtils::test_cpu_offload_for_dtensor, test/distributed/checkpoint/test_state_dict_utils.py::TestStateDictUtils::test_create_cpu_state_dict, test/distributed/checkpoint/test_state_dict_utils.py::TestStateDictUtils::test_gather_state_dict_dtensor, test/distributed/checkpoint/test_state_dict_utils.py::TestStateDictUtils::test_gather_with_cpu_and_ranks_only, test/distributed/checkpoint/test_state_dict_utils.py::TestStateDictUtils::test_state_dict_util_distribute_tensors 2025-12-04T11:43:56.9008950Z 2025-12-04T11:43:56.9009403Z Finished distributed/checkpoint/test_state_dict_utils 1/1 ... [2025-12-04 11:43:56.899410][10268.507325425], took 0.75min 2025-12-04T11:43:56.9284531Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.checkpoint.test_state_dict_utils/distributed.checkpoint.test_state_dict_utils-a19642af8d31d778.xml 2025-12-04T11:43:57.0464158Z Running distributed/rpc/test_faulty_agent 1/1 ... [2025-12-04 11:43:57.046192][10268.654108426] 2025-12-04T11:43:57.0464785Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:43:57.0467226Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/rpc/test_faulty_agent.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:43:57.046531] 2025-12-04T11:44:00.8263906Z 2025-12-04T11:44:00.8265036Z distributed/rpc/test_faulty_agent 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.rpc.test_faulty_agent_1.1_9f30efe05bf109e0_.log 2025-12-04T11:44:00.8266372Z Running 0 items in this shard: 2025-12-04T11:44:00.8266604Z 2025-12-04T11:44:00.8267007Z Finished distributed/rpc/test_faulty_agent 1/1 ... [2025-12-04 11:44:00.826215][10272.434131207], took 0.06min 2025-12-04T11:44:00.8970487Z Running distributed/_shard/sharded_tensor/ops/test_embedding 1/1 ... [2025-12-04 11:44:00.896508][10272.504424945] 2025-12-04T11:44:00.8971190Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:44:00.8972574Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_shard/sharded_tensor/ops/test_embedding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:44:00.896884] 2025-12-04T11:44:16.7032218Z 2025-12-04T11:44:16.7034609Z distributed/_shard/sharded_tensor/ops/test_embedding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._shard.sharded_tensor.ops.test_embedding_1.1_94d647ccb113bbd0_.log 2025-12-04T11:44:16.7038253Z Running 2 items in this shard: test/distributed/_shard/sharded_tensor/ops/test_embedding.py::TestShardedEmbedding::test_sharded_embedding_colwise, test/distributed/_shard/sharded_tensor/ops/test_embedding.py::TestShardedEmbedding::test_sharded_embedding_rowwise 2025-12-04T11:44:16.7040350Z 2025-12-04T11:44:16.7040935Z Finished distributed/_shard/sharded_tensor/ops/test_embedding 1/1 ... [2025-12-04 11:44:16.702789][10288.310704722], took 0.26min 2025-12-04T11:44:16.7403957Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._shard.sharded_tensor.ops.test_embedding/distributed._shard.sharded_tensor.ops.test_embedding-fd33e5d9c41f35fb.xml 2025-12-04T11:44:16.8706644Z Running distributed/_shard/sharded_tensor/test_sharded_tensor_reshard 1/1 ... [2025-12-04 11:44:16.870425][10288.478342458] 2025-12-04T11:44:16.8707432Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:44:16.8710044Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_shard/sharded_tensor/test_sharded_tensor_reshard.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:44:16.870795] 2025-12-04T11:44:32.5738637Z 2025-12-04T11:44:32.5740063Z distributed/_shard/sharded_tensor/test_sharded_tensor_reshard 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._shard.sharded_tensor.test_sharded_tensor_reshard_1.1_41e70f878ccc4095_.log 2025-12-04T11:44:32.5742762Z Running 2 items in this shard: test/distributed/_shard/sharded_tensor/test_sharded_tensor_reshard.py::TestReshard::test_sharded_tensor_reshard, test/distributed/_shard/sharded_tensor/test_sharded_tensor_reshard.py::TestReshard::test_sharded_tensor_reshard_errors 2025-12-04T11:44:32.5744274Z 2025-12-04T11:44:32.5744804Z Finished distributed/_shard/sharded_tensor/test_sharded_tensor_reshard 1/1 ... [2025-12-04 11:44:32.573494][10304.181410122], took 0.26min 2025-12-04T11:44:32.6029920Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._shard.sharded_tensor.test_sharded_tensor_reshard/distributed._shard.sharded_tensor.test_sharded_tensor_reshard-e6bc79067fb0604d.xml 2025-12-04T11:44:32.7219270Z Running distributed/test_c10d_spawn_nccl 1/1 ... [2025-12-04 11:44:32.721658][10304.329575497] 2025-12-04T11:44:32.7219904Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:44:32.7222104Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_spawn_nccl.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:44:32.722004] 2025-12-04T11:46:03.5481274Z 2025-12-04T11:46:03.5482461Z distributed/test_c10d_spawn_nccl 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_spawn_nccl_1.1_1bf221cec02d55ca_.log 2025-12-04T11:46:03.5488670Z Running 10 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_gather, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_gather_base, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_reduce_non_contiguous, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_to_all, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_to_all_single, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_allreduce, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_broadcast, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_reduce, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_reduce_scatter, test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_reduce_scatter_non_contiguous 2025-12-04T11:46:03.5494046Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_gather 2025-12-04T11:46:03.5495145Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_gather_base 2025-12-04T11:46:03.5496339Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_reduce_non_contiguous 2025-12-04T11:46:03.5497846Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_to_all 2025-12-04T11:46:03.5499011Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_all_to_all_single 2025-12-04T11:46:03.5500139Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_allreduce 2025-12-04T11:46:03.5501242Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_broadcast 2025-12-04T11:46:03.5502330Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_reduce 2025-12-04T11:46:03.5503431Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_reduce_scatter 2025-12-04T11:46:03.5504662Z Running 1 items in this shard: test/distributed/test_c10d_spawn_nccl.py::TestDistributedNNFunctionsNccl::test_reduce_scatter_non_contiguous 2025-12-04T11:46:03.5505403Z 2025-12-04T11:46:03.5505795Z Finished distributed/test_c10d_spawn_nccl 1/1 ... [2025-12-04 11:46:03.547749][10395.155665929], took 1.51min 2025-12-04T11:46:03.5777297Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-2ef4942791579d03.xml 2025-12-04T11:46:03.6602735Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-d882aa7ed351d2b7.xml 2025-12-04T11:46:03.6881287Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-e41d47243c13be74.xml 2025-12-04T11:46:03.7251079Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-2ed2ccb680132309.xml 2025-12-04T11:46:03.7583284Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-a86d7398eb9ff93b.xml 2025-12-04T11:46:03.7859693Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-50f191d4627fdfd2.xml 2025-12-04T11:46:03.8189611Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-8cb70355957e1b4b.xml 2025-12-04T11:46:03.8439672Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-bbde3500be39702b.xml 2025-12-04T11:46:03.8773214Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-1805de606cf78685.xml 2025-12-04T11:46:03.9077832Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-8a898c87fa4f8fd3.xml 2025-12-04T11:46:03.9768869Z Running distributed/test_c10d_spawn_ucc 1/1 ... [2025-12-04 11:46:03.976294][10395.584211275] 2025-12-04T11:46:03.9769499Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:46:03.9770948Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_spawn_ucc.py', '--shard-id=1', '--num-shards=1', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:46:03.976640] 2025-12-04T11:46:27.5323971Z 2025-12-04T11:46:27.5325051Z distributed/test_c10d_spawn_ucc 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_spawn_ucc_1.1_5521268884e60126_.log 2025-12-04T11:46:27.5328753Z Running 6 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_gather, test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_to_all, test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_to_all_single, test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_allreduce, test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_broadcast, test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_reduce 2025-12-04T11:46:27.5332173Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_gather 2025-12-04T11:46:27.5333259Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_to_all 2025-12-04T11:46:27.5334380Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_all_to_all_single 2025-12-04T11:46:27.5335479Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_allreduce 2025-12-04T11:46:27.5336986Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_broadcast 2025-12-04T11:46:27.5338075Z Running 1 items in this shard: test/distributed/test_c10d_spawn_ucc.py::TestDistributedNNFunctionsUcc::test_reduce 2025-12-04T11:46:27.5338671Z 2025-12-04T11:46:27.5339065Z Finished distributed/test_c10d_spawn_ucc 1/1 ... [2025-12-04 11:46:27.532007][10419.139924091], took 0.39min 2025-12-04T11:46:27.5623900Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-41764b12ccdf212e.xml 2025-12-04T11:46:27.6584472Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-aee5aa2ded024d85.xml 2025-12-04T11:46:27.6922138Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-8800a2e7b955ab16.xml 2025-12-04T11:46:27.7332807Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-3a092f5472894a7f.xml 2025-12-04T11:46:27.7637922Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-f628509e7e3f2a1f.xml 2025-12-04T11:46:27.7979750Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-c1a78b733abc6caa.xml 2025-12-04T11:46:27.8726936Z Running distributed/test_c10d_gloo 1/2 ... [2025-12-04 11:46:27.872488][10419.480405833] 2025-12-04T11:46:27.8727539Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T11:46:27.8729955Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_gloo.py', '--shard-id=1', '--num-shards=2', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:46:27.872806] 2025-12-04T12:03:45.7298560Z 2025-12-04T12:03:45.7299957Z distributed/test_c10d_gloo 1/2 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_gloo_1.2_d5d0e2b1d744a982_.log 2025-12-04T12:03:45.7360290Z Running 127 items in this shard: test/distributed/test_c10d_gloo.py::RendezvousTCPTest::test_tcp_init, test/distributed/test_c10d_gloo.py::TimeoutTest::test_default_store_timeout_gloo, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_coalesced_async, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_into_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_checks_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_overall_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_barrier_implies_wait, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_empty_tensors, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_multi_device_constructor, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_send_recv_all_to_all, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_set_gloo_pg_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_dataclass_output, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_dynamic_module, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_dynamic_weight_sharing, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_False, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_True, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_True, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_False, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_True, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_True, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_True, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_future_passing_cpu, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_future_passing_gpu_gloo, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_sparse_gradients, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad_with_grad_is_view, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_cpu_module, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_cpu_module_grad_is_view, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_output, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_sharded_tensor, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sparse_gradients, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sparse_gradients_grad_is_view, test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sync_batch_norm_empty_input, test/distributed/test_c10d_gloo.py::ReducerTest::test_forward_backward, test/distributed/test_c10d_gloo.py::ReducerTest::test_multi_dtype_single_bucket, test/distributed/test_c10d_gloo.py::ReducerTest::test_single_dtype_single_bucket, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_inference_mode, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_into_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_async, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_checks_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_op_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_overall_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_block_current_stream_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_empty_tensors, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_noncontiguous_input, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter_tensor, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_send_recv_all_to_all, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_set_gloo_pg_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_basics, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_coalesced_async, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_coalesced_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_inference_mode, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_op_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_overall_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_block_current_stream_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_stress, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_checks, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter_tensor_coalesced, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_stress_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_send_recv_all_to_all, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_send_recv_complex, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_set_gloo_pg_timeout, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_short_json, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_basics_cuda, test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_checks, test/distributed/test_c10d_gloo.py::CommTest::test_broadcast_coalesced_gloo_cpu, test/distributed/test_c10d_gloo.py::CommTest::test_broadcast_coalesced_gloo_cuda, test/distributed/test_c10d_gloo.py::CommTest::test_gloo_rank_membership, test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_set_default_pg_gloo, test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_set_gloo_new_group, test/distributed/test_c10d_gloo.py::CommTest::test_tensor_dtype_complex, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_allgather_coalesced, test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_init_process_group_for_all_backends, test/distributed/test_c10d_gloo.py::LargeCommTest::test_new_group_local_sync_duplicate_pg 2025-12-04T12:03:45.7418140Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::RendezvousTCPTest::test_tcp_init 2025-12-04T12:03:45.7419107Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::TimeoutTest::test_default_store_timeout_gloo 2025-12-04T12:03:45.7420167Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_coalesced_async 2025-12-04T12:03:45.7421521Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_coalesced_checks 2025-12-04T12:03:45.7422656Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_into_tensor_coalesced 2025-12-04T12:03:45.7423739Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_stress 2025-12-04T12:03:45.7424767Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allgather_stress_cuda 2025-12-04T12:03:45.7425794Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_basics 2025-12-04T12:03:45.7426807Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_basics_cuda 2025-12-04T12:03:45.7427835Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_checks 2025-12-04T12:03:45.7428886Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_basics 2025-12-04T12:03:45.7430023Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_coalesced_checks_cuda 2025-12-04T12:03:45.7431141Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_overall_timeout 2025-12-04T12:03:45.7432191Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_allreduce_stress 2025-12-04T12:03:45.7433372Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_barrier_implies_wait 2025-12-04T12:03:45.7434282Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_basics 2025-12-04T12:03:45.7435277Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_basics_cuda 2025-12-04T12:03:45.7436196Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_stress 2025-12-04T12:03:45.7437106Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_broadcast_stress_cuda 2025-12-04T12:03:45.7438008Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_empty_tensors 2025-12-04T12:03:45.7438885Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_gather_stress_cuda 2025-12-04T12:03:45.7439997Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_multi_device_constructor 2025-12-04T12:03:45.7440964Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_basics 2025-12-04T12:03:45.7441891Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_stress 2025-12-04T12:03:45.7442816Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_reduce_stress_cuda 2025-12-04T12:03:45.7443812Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_basics 2025-12-04T12:03:45.7444758Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_basics_cuda 2025-12-04T12:03:45.7445708Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_scatter_stress 2025-12-04T12:03:45.7446643Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_send_recv_all_to_all 2025-12-04T12:03:45.7447625Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_set_gloo_pg_timeout 2025-12-04T12:03:45.7448615Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_basics 2025-12-04T12:03:45.7449648Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooTest::test_sparse_allreduce_basics_cuda 2025-12-04T12:03:45.7450705Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_dataclass_output 2025-12-04T12:03:45.7451809Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_dynamic_module 2025-12-04T12:03:45.7453112Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_dynamic_weight_sharing 2025-12-04T12:03:45.7454326Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_False 2025-12-04T12:03:45.7455538Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_True 2025-12-04T12:03:45.7457055Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_True 2025-12-04T12:03:45.7458499Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_False 2025-12-04T12:03:45.7459881Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_True 2025-12-04T12:03:45.7461290Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_True 2025-12-04T12:03:45.7462724Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_True 2025-12-04T12:03:45.7464121Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_future_passing_cpu 2025-12-04T12:03:45.7465450Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_future_passing_gpu_gloo 2025-12-04T12:03:45.7466729Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ddp_comm_hook_sparse_gradients 2025-12-04T12:03:45.7467969Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad 2025-12-04T12:03:45.7469391Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_global_local_unused_params_grad_with_grad_is_view 2025-12-04T12:03:45.7470644Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_cpu_module 2025-12-04T12:03:45.7471842Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_gloo_backend_cpu_module_grad_is_view 2025-12-04T12:03:45.7472984Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_output 2025-12-04T12:03:45.7474093Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_ignored_sharded_tensor 2025-12-04T12:03:45.7475175Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sparse_gradients 2025-12-04T12:03:45.7476299Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sparse_gradients_grad_is_view 2025-12-04T12:03:45.7477472Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::DistributedDataParallelTest::test_sync_batch_norm_empty_input 2025-12-04T12:03:45.7478511Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ReducerTest::test_forward_backward 2025-12-04T12:03:45.7479443Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ReducerTest::test_multi_dtype_single_bucket 2025-12-04T12:03:45.7480420Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ReducerTest::test_single_dtype_single_bucket 2025-12-04T12:03:45.7481652Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_checks 2025-12-04T12:03:45.7482678Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_coalesced_checks 2025-12-04T12:03:45.7483749Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_inference_mode 2025-12-04T12:03:45.7484834Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_into_tensor_coalesced 2025-12-04T12:03:45.7485865Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allgather_stress 2025-12-04T12:03:45.7486864Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_basics_cuda 2025-12-04T12:03:45.7487855Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_checks 2025-12-04T12:03:45.7488872Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_async 2025-12-04T12:03:45.7489932Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_checks 2025-12-04T12:03:45.7491029Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_checks_cuda 2025-12-04T12:03:45.7492127Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_coalesced_stress 2025-12-04T12:03:45.7493166Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_op_timeout 2025-12-04T12:03:45.7494241Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_overall_timeout 2025-12-04T12:03:45.7495266Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_stress 2025-12-04T12:03:45.7496267Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_allreduce_stress_cuda 2025-12-04T12:03:45.7497629Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_block_current_stream_cuda 2025-12-04T12:03:45.7498757Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_basics 2025-12-04T12:03:45.7499846Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_stress 2025-12-04T12:03:45.7500967Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_broadcast_stress_cuda 2025-12-04T12:03:45.7502078Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_empty_tensors 2025-12-04T12:03:45.7503167Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_basics 2025-12-04T12:03:45.7504228Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_checks 2025-12-04T12:03:45.7505361Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_noncontiguous_input 2025-12-04T12:03:45.7506527Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_gather_stress_cuda 2025-12-04T12:03:45.7507641Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter_tensor 2025-12-04T12:03:45.7508956Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_scatter_tensor_coalesced 2025-12-04T12:03:45.7510108Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_reduce_stress 2025-12-04T12:03:45.7511091Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_basics 2025-12-04T12:03:45.7512041Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_scatter_checks 2025-12-04T12:03:45.7513019Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_send_recv_all_to_all 2025-12-04T12:03:45.7514016Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_set_gloo_pg_timeout 2025-12-04T12:03:45.7515042Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooLazyInitTest::test_sparse_allreduce_basics 2025-12-04T12:03:45.7516011Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_basics 2025-12-04T12:03:45.7516948Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_basics_cuda 2025-12-04T12:03:45.7517929Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_coalesced_async 2025-12-04T12:03:45.7518921Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_coalesced_checks 2025-12-04T12:03:45.7519902Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_inference_mode 2025-12-04T12:03:45.7520992Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_stress 2025-12-04T12:03:45.7522191Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allgather_stress_cuda 2025-12-04T12:03:45.7523373Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_basics_cuda 2025-12-04T12:03:45.7524463Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_coalesced_stress 2025-12-04T12:03:45.7525557Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_op_timeout 2025-12-04T12:03:45.7526646Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_overall_timeout 2025-12-04T12:03:45.7527719Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_allreduce_stress 2025-12-04T12:03:45.7528775Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_block_current_stream_cuda 2025-12-04T12:03:45.7529850Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_stress 2025-12-04T12:03:45.7530911Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_broadcast_stress_cuda 2025-12-04T12:03:45.7531965Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_basics_cuda 2025-12-04T12:03:45.7533014Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_checks 2025-12-04T12:03:45.7534085Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_gather_stress_cuda 2025-12-04T12:03:45.7534996Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_checks 2025-12-04T12:03:45.7535882Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter 2025-12-04T12:03:45.7537083Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_reduce_scatter_tensor_coalesced 2025-12-04T12:03:45.7538214Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_basics_cuda 2025-12-04T12:03:45.7539267Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_scatter_stress_cuda 2025-12-04T12:03:45.7540371Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_send_recv_all_to_all 2025-12-04T12:03:45.7541405Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_send_recv_complex 2025-12-04T12:03:45.7542446Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_set_gloo_pg_timeout 2025-12-04T12:03:45.7543452Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_short_json 2025-12-04T12:03:45.7544514Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_basics_cuda 2025-12-04T12:03:45.7545636Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::ProcessGroupGlooFRTest::test_sparse_allreduce_checks 2025-12-04T12:03:45.7546679Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_broadcast_coalesced_gloo_cpu 2025-12-04T12:03:45.7547681Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_broadcast_coalesced_gloo_cuda 2025-12-04T12:03:45.7548776Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_gloo_rank_membership 2025-12-04T12:03:45.7549764Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_set_default_pg_gloo 2025-12-04T12:03:45.7550674Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_sequence_num_set_gloo_new_group 2025-12-04T12:03:45.7551547Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::CommTest::test_tensor_dtype_complex 2025-12-04T12:03:45.7552627Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_allgather_coalesced 2025-12-04T12:03:45.7553952Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::GlooProcessGroupWithDispatchedCollectivesTests::test_init_process_group_for_all_backends 2025-12-04T12:03:45.7555144Z Running 1 items in this shard: test/distributed/test_c10d_gloo.py::LargeCommTest::test_new_group_local_sync_duplicate_pg 2025-12-04T12:03:45.7555701Z 2025-12-04T12:03:45.7556021Z Finished distributed/test_c10d_gloo 1/2 ... [2025-12-04 12:03:45.732891][11457.340804599], took 17.30min 2025-12-04T12:03:45.7651137Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0991bf72558fb22b.xml 2025-12-04T12:03:45.8517106Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-aa6ce215ba96a24c.xml 2025-12-04T12:03:45.8847153Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-16fe1d620732710b.xml 2025-12-04T12:03:45.9138685Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3fe1795a5d3e5b88.xml 2025-12-04T12:03:45.9434528Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6c7276bb9fa9eee2.xml 2025-12-04T12:03:45.9737274Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cd50578f9742b761.xml 2025-12-04T12:03:46.0177834Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5e60172a210dc8b6.xml 2025-12-04T12:03:46.0542866Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-873ae68d43267ac9.xml 2025-12-04T12:03:46.1335597Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-34c50e4612c9fea4.xml 2025-12-04T12:03:46.1667362Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d54fb6be7a931b62.xml 2025-12-04T12:03:46.1976109Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2259b8bd184524fc.xml 2025-12-04T12:03:46.2363983Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8f01caa16144b040.xml 2025-12-04T12:03:46.2700167Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-31de274c3cb59c01.xml 2025-12-04T12:03:46.3393837Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-db19637423ab0dbc.xml 2025-12-04T12:03:46.3745257Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b23ea90304491b65.xml 2025-12-04T12:03:46.4157594Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eaee01f734bb6504.xml 2025-12-04T12:03:46.4515812Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0fa860b184f8ddb6.xml 2025-12-04T12:03:46.4817096Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-33cbbe588c8f840c.xml 2025-12-04T12:03:46.5168477Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-de8dc85b62067611.xml 2025-12-04T12:03:46.5476040Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0f2cd4f378b677f0.xml 2025-12-04T12:03:46.5796862Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e35b0454119a9f51.xml 2025-12-04T12:03:46.6146863Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d98cd20152af5d53.xml 2025-12-04T12:03:46.6475309Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3982ee850d6ce795.xml 2025-12-04T12:03:46.6794948Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-08455987c8f710af.xml 2025-12-04T12:03:46.7146492Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e90446a7a06b5b78.xml 2025-12-04T12:03:46.7497986Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3abd929020861bdc.xml 2025-12-04T12:03:46.7787204Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d79cb42da7e54a79.xml 2025-12-04T12:03:46.8116168Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1a14244d1e7f6bb2.xml 2025-12-04T12:03:46.8456755Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a80b6bac28c5c972.xml 2025-12-04T12:03:46.8847128Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bf45f3c093461361.xml 2025-12-04T12:03:46.9169088Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-81160b788c5abcc2.xml 2025-12-04T12:03:46.9538634Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2242d642afc7f886.xml 2025-12-04T12:03:46.9821386Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-327f840cbb3f5094.xml 2025-12-04T12:03:47.0162806Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-724f786ab432a45b.xml 2025-12-04T12:03:47.0477211Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-aae15a76989ce46a.xml 2025-12-04T12:03:47.0795821Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ee273f849859fe9.xml 2025-12-04T12:03:47.1123280Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93baf128de560649.xml 2025-12-04T12:03:47.1425057Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1f85ec05eddb726d.xml 2025-12-04T12:03:47.1749805Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c9eb752317a73e18.xml 2025-12-04T12:03:47.2089061Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cedb520e520b4782.xml 2025-12-04T12:03:47.2415208Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e69dd1a2e9fba2dc.xml 2025-12-04T12:03:47.2724155Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-47c9021380160661.xml 2025-12-04T12:03:47.3037066Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-681adc1d59f04282.xml 2025-12-04T12:03:47.3338021Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1755a27e81246495.xml 2025-12-04T12:03:47.3624999Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b2036226275eb311.xml 2025-12-04T12:03:47.3924171Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3f50e0fff8c24c86.xml 2025-12-04T12:03:47.4254766Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d908f57090f2acd6.xml 2025-12-04T12:03:47.4565730Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-ac7a92e764fd2c8b.xml 2025-12-04T12:03:47.4886782Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2f80e6d84c47c0a7.xml 2025-12-04T12:03:47.5219996Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2042e0d50243da8a.xml 2025-12-04T12:03:47.5543189Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bb9adcd8663666ac.xml 2025-12-04T12:03:47.5938694Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-246370ceca8d8d8b.xml 2025-12-04T12:03:47.6269277Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f75c8f9699a93e6a.xml 2025-12-04T12:03:47.6538596Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-830d90348309a50c.xml 2025-12-04T12:03:47.6856341Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-257d76299fdbf250.xml 2025-12-04T12:03:47.7138780Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fa0b0b810d894be9.xml 2025-12-04T12:03:47.7437334Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b713da153aca8219.xml 2025-12-04T12:03:47.7779214Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-812da336a80f282a.xml 2025-12-04T12:03:47.8089863Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2be07987a59e5da5.xml 2025-12-04T12:03:47.8375499Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0d952f420fed2de5.xml 2025-12-04T12:03:47.8676875Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d29bf39728651f67.xml 2025-12-04T12:03:47.9018276Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-01e88d26c5e6aa85.xml 2025-12-04T12:03:47.9313655Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-25efe3194372b4e6.xml 2025-12-04T12:03:47.9650377Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ccf063a53847c36.xml 2025-12-04T12:03:47.9965912Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-72be92db0e827d7f.xml 2025-12-04T12:03:48.0338836Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-84f86de4e3aa962a.xml 2025-12-04T12:03:48.0607900Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e5c4d09fb827cb7f.xml 2025-12-04T12:03:48.0938840Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-165d83ae78886ff8.xml 2025-12-04T12:03:48.1275456Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-76f6fcd9346eff0a.xml 2025-12-04T12:03:48.1581243Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e84bdf3d05666f91.xml 2025-12-04T12:03:48.1907586Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a357bf2b1c694c62.xml 2025-12-04T12:03:48.2244767Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b1b5f73bcb8b828f.xml 2025-12-04T12:03:48.2558464Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e742397162ed9e3d.xml 2025-12-04T12:03:48.2887921Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f3a1c05a7b5c0fa8.xml 2025-12-04T12:03:48.3166071Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fcd37833b58d4bea.xml 2025-12-04T12:03:48.3508328Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e22bb2e46b3ab636.xml 2025-12-04T12:03:48.3899002Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d319014b034c95bf.xml 2025-12-04T12:03:48.4176851Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-393bf6208ab91711.xml 2025-12-04T12:03:48.4540937Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bb9e40b9771000a0.xml 2025-12-04T12:03:48.4894757Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d597ca27d8328fc4.xml 2025-12-04T12:03:48.5222679Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-ff18cf4d50e44f39.xml 2025-12-04T12:03:48.5614685Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0be906a8969ec101.xml 2025-12-04T12:03:48.5937640Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-158f1ad05ae2a64b.xml 2025-12-04T12:03:48.6298983Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-87453a67a1ebaea6.xml 2025-12-04T12:03:48.6597307Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-94f3fac53aec8990.xml 2025-12-04T12:03:48.7480230Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93576123b2405b32.xml 2025-12-04T12:03:48.7894987Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f6666d1683ab3f1d.xml 2025-12-04T12:03:48.8182589Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-54b039aca43fe5b7.xml 2025-12-04T12:03:48.8489570Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8eea24e340cd482b.xml 2025-12-04T12:03:48.8846090Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-abf845b544fb7d20.xml 2025-12-04T12:03:48.9208830Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f27d8d563aeff333.xml 2025-12-04T12:03:48.9505653Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b98a8d5dfa728efd.xml 2025-12-04T12:03:48.9860286Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f9a146a8fac2af4d.xml 2025-12-04T12:03:49.0178728Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d8bb6ca9e3ae378b.xml 2025-12-04T12:03:49.0466096Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-604db34ae5cbb6b2.xml 2025-12-04T12:03:49.0849833Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6d6d34df2e34630b.xml 2025-12-04T12:03:49.1158760Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-520dfe050df69b4b.xml 2025-12-04T12:03:49.1475655Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2074cd035f8dc8fc.xml 2025-12-04T12:03:49.1765777Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-468dffdf4603fb37.xml 2025-12-04T12:03:49.2077001Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fb8500504162f453.xml 2025-12-04T12:03:49.2378972Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-56d2f4c749889dbc.xml 2025-12-04T12:03:49.3238505Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8cef0d6061a45be8.xml 2025-12-04T12:03:49.3538233Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93d1d438aff7bb95.xml 2025-12-04T12:03:49.3843108Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5c11159a66fb94a9.xml 2025-12-04T12:03:49.4166155Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c1ea079cea0d8e56.xml 2025-12-04T12:03:49.4486236Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f25b64af298ca601.xml 2025-12-04T12:03:49.4796217Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-87383ac3904bfe89.xml 2025-12-04T12:03:49.5096740Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d793a1fedd0d4f15.xml 2025-12-04T12:03:49.5418730Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b67795a049190b1d.xml 2025-12-04T12:03:49.5757555Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bde1923c97f63381.xml 2025-12-04T12:03:49.6157520Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2540c713fc68453d.xml 2025-12-04T12:03:49.6471275Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8d1d058689da62ff.xml 2025-12-04T12:03:49.6785367Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0c93a8978347968a.xml 2025-12-04T12:03:49.7069485Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-18641772917d69fc.xml 2025-12-04T12:03:49.7379022Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6a77c9a2c337df36.xml 2025-12-04T12:03:49.7695565Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-25efbb19e469ebb7.xml 2025-12-04T12:03:49.8019536Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eac363af2c24f931.xml 2025-12-04T12:03:49.8302975Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-33bf8b4540a40636.xml 2025-12-04T12:03:49.8618007Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-45778cf420dbd19f.xml 2025-12-04T12:03:49.8917062Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-7dfffc535a3e90f1.xml 2025-12-04T12:03:49.9224500Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4b2795b0e7efac26.xml 2025-12-04T12:03:49.9516655Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2b369bec34855654.xml 2025-12-04T12:03:49.9807734Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d6b15d261538e27e.xml 2025-12-04T12:03:50.0138843Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ef76d7bc1711751.xml 2025-12-04T12:03:50.0445582Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0343427a5558824f.xml 2025-12-04T12:03:50.0723784Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3f70a63e56a4848b.xml 2025-12-04T12:03:50.1037366Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-821ac567b5ed63bc.xml 2025-12-04T12:03:50.6885198Z Uploading artifacts took 0.51 seconds 2025-12-04T12:03:50.6887332Z Running distributed/_shard/sharded_tensor/test_sharded_tensor 1/1 ... [2025-12-04 12:03:50.688442][11462.296357453] 2025-12-04T12:03:50.6888116Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:03:50.6889789Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/_shard/sharded_tensor/test_sharded_tensor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:03:50.688788] 2025-12-04T12:13:17.7419520Z 2025-12-04T12:13:17.7421157Z distributed/_shard/sharded_tensor/test_sharded_tensor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributed._shard.sharded_tensor.test_sharded_tensor_1.1_24bd8bcdd0ba69c1_.log 2025-12-04T12:13:17.7467835Z Running 74 items in this shard: test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorMetadata::test_serialize_and_deserialize, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestCreateTensorFromParams::test_empty, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardParameter::test_shard_parameter, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardParameter::test_shard_parameter_errors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardTensor::test_shard_tensor, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardTensor::test_shard_tensor_errors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardTensor::test_shard_tensor_with_empty_shard, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestModuleHookApi::test_collect_local_shard, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestModuleHookApi::test_reshard_output, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestLocalTensor::test_local_tensor, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestLocalTensor::test_local_tensor_error, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_cleanup, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_complete_world_size, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_create_sharded_tensor_like, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_create_sharded_tensor_with_full, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_create_sharded_tensor_with_ones, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_create_sharded_tensor_with_rand, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_create_sharded_tensor_with_zeros, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_gather_even, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_gather_uneven, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_insufficient_sharding_dims, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_invalid_pg_rpc_ranks, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_invalid_sharding, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_load_state_dict_errors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_multiple_local_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_new_group, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_partial_world_size, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_sharded_tensor_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_sharded_tensor_sizes, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_sharding_columns, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_state_dict, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_state_dict_new_group, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorChunked::test_state_dict_no_sharded_tensors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_create_sharded_tensor_with_ones, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_gather_even, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_gather_uneven, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_grid_sharding, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_multiple_local_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_new_group, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_partial_world_size, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_sharded_tensor_device, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_sharded_tensor_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_sharded_tensor_to_cpu, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_sharded_tensor_to_cuda, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_sharded_tensor_to_test, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_uneven_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorEnumerable::test_with_rpc_names, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalTensor::test_init_from_local_tensor, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalTensor::test_init_from_local_tensor_errors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_and_global_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_and_global_metadata_invalid_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_and_global_metadata_with_all_zeros, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_and_global_metadata_with_local_view, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_invalid_local_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_invalid_pin_memory, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_invalid_property_cross_ranks, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_invalid_shards_gaps, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_invalid_shards_overlap, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_new_group, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_init_from_local_shards_with_different_glb_size, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_local_shards, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_non_rw_sharded_recalc_for_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_recalc_for_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorFromLocalShards::test_st_base_init_from_local_shards_and_global_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorCustomOps::test_custom_op, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorCustomOps::test_custom_op_errors, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorCustomOps::test_custom_op_override, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardMetadata::test_create_shard_with_no_placement, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardMetadata::test_shard_metadata_init, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorSubGroupInit::test_sub_process_group_placement_validation, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestShardedTensorSubGroupInit::test_sub_process_group_sharded_tensor_init, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestCreateTensorNoProcessGroupMode::test_init_from_local_shards_and_global_metadata, test/distributed/_shard/sharded_tensor/test_sharded_tensor.py::TestCreateTensorNoProcessGroupMode::test_non_contiguous_local_shards 2025-12-04T12:13:17.7513214Z 2025-12-04T12:13:17.7513694Z Finished distributed/_shard/sharded_tensor/test_sharded_tensor 1/1 ... [2025-12-04 12:13:17.743251][12029.351163338], took 9.45min 2025-12-04T12:13:17.7883215Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed._shard.sharded_tensor.test_sharded_tensor/distributed._shard.sharded_tensor.test_sharded_tensor-ae33be926ad38292.xml 2025-12-04T12:13:17.9119303Z Running distributed/test_c10d_nccl 3/3 ... [2025-12-04 12:13:17.911328][12029.51924529] 2025-12-04T12:13:17.9119875Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set 2025-12-04T12:13:17.9121556Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributed/test_c10d_nccl.py', '--shard-id=3', '--num-shards=3', '-v', '--subprocess', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=0', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:13:17.911668] 2025-12-04T12:24:58.9905622Z 2025-12-04T12:24:58.9907019Z distributed/test_c10d_nccl 3/3 was successful, full logs can be found in artifacts with path test/test-reports/distributed.test_c10d_nccl_3.3_41c01794b25a1cc6_.log 2025-12-04T12:24:58.9944605Z Running 72 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLInitTest::test_init_wo_backend_str, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_abort_in_destroy_mixed_empty_pgs, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_comm_eager_init_subgroup, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_comm_split_group_mixed_backend, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extra_cuda_context, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extra_cuda_context_sync_ops, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_get_uid, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nan_assert_float32, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nccl_dist_backend_error, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_non_blocking_with_eager_init, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_restart_pg, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_set_nccl_pg_timeout_backend_nccl, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_flags, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_nccl_config, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_performance, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_vs_abort_reinit_performance, test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_subgroup_p2p_eager_init_False, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_arbitrary_forward_return_value_grad_is_view, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_builtin_ddp_comm_hooks_nccl, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_channels_last_contig, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_dataclass_output_unused_param, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_True, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_False, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_True, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_True, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_weight_sharing, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_True, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_True, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_comm_hook_allreduce_hook_nccl, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_complex_params_and_grads, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_packed_sequence, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_failure_recovery, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_find_unused_parameters_kwarg_grad_is_view_debug_info, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_fp16_compress_wrapper_is_view, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_multiple_outputs_multiple_backward_grad_is_view, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_backend_multi_device_ids_not_allowed, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_sync_batch_norm_empty_input, test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_sync_batch_norm_only_empty_input, test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_nccl_blocking_wait_with_barrier, test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_nccl_errors_blocking, test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_send_recv_non_dense_tensor, test/distributed/test_c10d_nccl.py::NcclUserBufferRegistrationTest::test_nccl_user_buffer_registration, test/distributed/test_c10d_nccl.py::CommTest::test_intra_node_comm_all_reduce, test/distributed/test_c10d_nccl.py::CommTest::test_nccl_warn_not_in_group_debug_detail, test/distributed/test_c10d_nccl.py::CommTest::test_nccl_warn_not_in_group_debug_info, test/distributed/test_c10d_nccl.py::CommTest::test_sequence_num_set_default_pg_nccl, test/distributed/test_c10d_nccl.py::CommTest::test_tensor_dtype_complex, test/distributed/test_c10d_nccl.py::CommTest::test_tensor_dtype_mismatch, test/distributed/test_c10d_nccl.py::CommTest::test_time_estimate_nccl, test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_all_to_all_single, test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allgather_float8_float8_e4m3fn, test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allreduce_coalesced, test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_init_process_group_for_all_backends, test/distributed/test_c10d_nccl.py::LargeCommTest::test_broadcast_object_list_subgroup_set_device1_group_rank_False, test/distributed/test_c10d_nccl.py::LargeCommTest::test_gather_object_subgroup_group_rank_True, test/distributed/test_c10d_nccl.py::LargeCommTest::test_new_group_local_sync_duplicated_pg, test/distributed/test_c10d_nccl.py::LargeCommTest::test_new_group_local_sync_sanity_check, test/distributed/test_c10d_nccl.py::LargeCommTest::test_reduce_subgroup_group_rank_True, test/distributed/test_c10d_nccl.py::LargeCommTest::test_scatter_subgroup_group_rank_False, test/distributed/test_c10d_nccl.py::LargeCommTest::test_send_recv_subgroup_group_rank_False_async_op_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_allgather_uneven_timing_enabled_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_barrier_profiling, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_batched_send_recv_op_sizes_per_coalesce0_timing_enabled_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_coalescing_manager_collective_timing_enabled_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_coalescing_manager_collective_timing_enabled_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_fr_record_reset_timing_enabled_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_individual_send_recv_op_sizes1_timing_enabled_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_json_timing_enabled_False_include_collectives_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_json_timing_enabled_True_include_collectives_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_active_timing_enabled_False_only_active_True, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_stuck_timing_enabled_False, test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_stuck_timing_enabled_True 2025-12-04T12:24:58.9980482Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLInitTest::test_init_wo_backend_str 2025-12-04T12:24:58.9981657Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_abort_in_destroy_mixed_empty_pgs 2025-12-04T12:24:58.9982843Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_comm_eager_init_subgroup 2025-12-04T12:24:58.9984014Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_comm_split_group_mixed_backend 2025-12-04T12:24:58.9985159Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extra_cuda_context 2025-12-04T12:24:58.9986318Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_extra_cuda_context_sync_ops 2025-12-04T12:24:58.9987427Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_get_uid 2025-12-04T12:24:58.9988448Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nan_assert_float32 2025-12-04T12:24:58.9989623Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_nccl_dist_backend_error 2025-12-04T12:24:58.9990775Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_non_blocking_with_eager_init 2025-12-04T12:24:58.9991835Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_restart_pg 2025-12-04T12:24:58.9992920Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_set_nccl_pg_timeout_backend_nccl 2025-12-04T12:24:58.9994031Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_flags 2025-12-04T12:24:58.9995109Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_nccl_config 2025-12-04T12:24:58.9996260Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_performance 2025-12-04T12:24:58.9997465Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_shrink_group_vs_abort_reinit_performance 2025-12-04T12:24:58.9998685Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::ProcessGroupNCCLGroupTest::test_subgroup_p2p_eager_init_False 2025-12-04T12:24:58.9999933Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_arbitrary_forward_return_value_grad_is_view 2025-12-04T12:24:59.0001183Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_builtin_ddp_comm_hooks_nccl 2025-12-04T12:24:59.0002307Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_channels_last_contig 2025-12-04T12:24:59.0003458Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_dataclass_output_unused_param 2025-12-04T12:24:59.0004715Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_once_use_reentrant_True 2025-12-04T12:24:59.0006096Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_False 2025-12-04T12:24:59.0007540Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_static_graph_use_reentrant_True 2025-12-04T12:24:59.0008904Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_use_reentrant_True 2025-12-04T12:24:59.0010215Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_twice_weight_sharing 2025-12-04T12:24:59.0011553Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_unused_params_use_reentrant_True 2025-12-04T12:24:59.0012946Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_checkpointing_weight_sharing_use_reentrant_True 2025-12-04T12:24:59.0014255Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_comm_hook_allreduce_hook_nccl 2025-12-04T12:24:59.0015443Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_complex_params_and_grads 2025-12-04T12:24:59.0016648Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_ddp_packed_sequence 2025-12-04T12:24:59.0017976Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_failure_recovery 2025-12-04T12:24:59.0019238Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_find_unused_parameters_kwarg_grad_is_view_debug_info 2025-12-04T12:24:59.0020562Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_fp16_compress_wrapper_is_view 2025-12-04T12:24:59.0022062Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_multiple_outputs_multiple_backward_grad_is_view 2025-12-04T12:24:59.0023444Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_nccl_backend_multi_device_ids_not_allowed 2025-12-04T12:24:59.0024722Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_sync_batch_norm_empty_input 2025-12-04T12:24:59.0025950Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::DistributedDataParallelTest::test_sync_batch_norm_only_empty_input 2025-12-04T12:24:59.0027158Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_nccl_blocking_wait_with_barrier 2025-12-04T12:24:59.0028344Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_nccl_errors_blocking 2025-12-04T12:24:59.0029422Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclErrorHandlingTest::test_send_recv_non_dense_tensor 2025-12-04T12:24:59.0030594Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclUserBufferRegistrationTest::test_nccl_user_buffer_registration 2025-12-04T12:24:59.0031705Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_intra_node_comm_all_reduce 2025-12-04T12:24:59.0032816Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_nccl_warn_not_in_group_debug_detail 2025-12-04T12:24:59.0033853Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_nccl_warn_not_in_group_debug_info 2025-12-04T12:24:59.0034790Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_sequence_num_set_default_pg_nccl 2025-12-04T12:24:59.0035661Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_tensor_dtype_complex 2025-12-04T12:24:59.0036485Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_tensor_dtype_mismatch 2025-12-04T12:24:59.0037307Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::CommTest::test_time_estimate_nccl 2025-12-04T12:24:59.0038300Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_all_to_all_single 2025-12-04T12:24:59.0039565Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allgather_float8_float8_e4m3fn 2025-12-04T12:24:59.0040844Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_allreduce_coalesced 2025-12-04T12:24:59.0042148Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NcclProcessGroupWithDispatchedCollectivesTests::test_init_process_group_for_all_backends 2025-12-04T12:24:59.0043427Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_broadcast_object_list_subgroup_set_device1_group_rank_False 2025-12-04T12:24:59.0044527Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_gather_object_subgroup_group_rank_True 2025-12-04T12:24:59.0045520Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_new_group_local_sync_duplicated_pg 2025-12-04T12:24:59.0046488Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_new_group_local_sync_sanity_check 2025-12-04T12:24:59.0047526Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_reduce_subgroup_group_rank_True 2025-12-04T12:24:59.0048675Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_scatter_subgroup_group_rank_False 2025-12-04T12:24:59.0049777Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::LargeCommTest::test_send_recv_subgroup_group_rank_False_async_op_True 2025-12-04T12:24:59.0050890Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_allgather_uneven_timing_enabled_True 2025-12-04T12:24:59.0051854Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_barrier_profiling 2025-12-04T12:24:59.0052931Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_batched_send_recv_op_sizes_per_coalesce0_timing_enabled_True 2025-12-04T12:24:59.0054162Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_coalescing_manager_collective_timing_enabled_False 2025-12-04T12:24:59.0055348Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_coalescing_manager_collective_timing_enabled_True 2025-12-04T12:24:59.0056573Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_fr_record_reset_timing_enabled_True 2025-12-04T12:24:59.0057900Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_individual_send_recv_op_sizes1_timing_enabled_False 2025-12-04T12:24:59.0059191Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_json_timing_enabled_False_include_collectives_False 2025-12-04T12:24:59.0060512Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_short_json_timing_enabled_True_include_collectives_True 2025-12-04T12:24:59.0061816Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_active_timing_enabled_False_only_active_True 2025-12-04T12:24:59.0063041Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_stuck_timing_enabled_False 2025-12-04T12:24:59.0064205Z Running 1 items in this shard: test/distributed/test_c10d_nccl.py::NCCLTraceTest::test_trace_while_stuck_timing_enabled_True 2025-12-04T12:24:59.0064834Z 2025-12-04T12:24:59.0065206Z Finished distributed/test_c10d_nccl 3/3 ... [2025-12-04 12:24:58.991818][12730.599734204], took 11.68min 2025-12-04T12:24:59.0374462Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4e483f68cef17162.xml 2025-12-04T12:24:59.1208017Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-05f5b130753b2983.xml 2025-12-04T12:24:59.1549683Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7e16e53ef8db6995.xml 2025-12-04T12:24:59.1918954Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1e281dcef1930575.xml 2025-12-04T12:24:59.2253147Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2b466e71a200bcdc.xml 2025-12-04T12:24:59.2546588Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-325c8a002e1c83a2.xml 2025-12-04T12:24:59.2849425Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c0b6a576b76efd0.xml 2025-12-04T12:24:59.3174216Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e47f2e15272edbaf.xml 2025-12-04T12:24:59.3485265Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a9e19469eb1a06d4.xml 2025-12-04T12:24:59.3816129Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-df7444533096a1d8.xml 2025-12-04T12:24:59.4127239Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d87d87bc823f3dba.xml 2025-12-04T12:24:59.4477674Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4a50a5ac8cd03017.xml 2025-12-04T12:24:59.4741692Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0ae50f0e1c874ad8.xml 2025-12-04T12:24:59.5055831Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7dbf8411ea4b6ce3.xml 2025-12-04T12:24:59.5410094Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2a6114c53cde50d7.xml 2025-12-04T12:24:59.5735906Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d109d91d9cd820a7.xml 2025-12-04T12:24:59.6046974Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7e589af2daee12d3.xml 2025-12-04T12:24:59.6539816Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ff536a30913e6717.xml 2025-12-04T12:24:59.6824432Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-16e8bb0ec51136f2.xml 2025-12-04T12:24:59.7142589Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-688fcf4f5f0deff2.xml 2025-12-04T12:24:59.7495275Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-c2f4984a060c2ce4.xml 2025-12-04T12:24:59.7835568Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4874c9e324e6599b.xml 2025-12-04T12:24:59.8196763Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-81b232fd98a6eda2.xml 2025-12-04T12:24:59.8524451Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-dbedd4dfa730b471.xml 2025-12-04T12:24:59.8977195Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e94fe5aed063a3e7.xml 2025-12-04T12:24:59.9299529Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-191142456fb777f7.xml 2025-12-04T12:24:59.9646329Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d909bdccb7ddf2c0.xml 2025-12-04T12:24:59.9956648Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2e3a4388e42e1415.xml 2025-12-04T12:25:00.0278645Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c5f42a263385a17.xml 2025-12-04T12:25:00.0578581Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a6537375079d62ca.xml 2025-12-04T12:25:00.0923255Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-515a3b961a30c93e.xml 2025-12-04T12:25:00.1276141Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-247b406154c62e2b.xml 2025-12-04T12:25:00.1565573Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-54fc92777b10ce8b.xml 2025-12-04T12:25:00.2074938Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-07a5e82fccbcefb0.xml 2025-12-04T12:25:00.2407347Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-98372eb164ddb8a6.xml 2025-12-04T12:25:00.2754616Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9a91f2cdfa9f567b.xml 2025-12-04T12:25:00.3058690Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-578f1554447ed157.xml 2025-12-04T12:25:00.3357162Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-cba9e46262707896.xml 2025-12-04T12:25:00.3714411Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-b5cc6836ef1a3879.xml 2025-12-04T12:25:00.4022641Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1a086feba79f79de.xml 2025-12-04T12:25:00.4460983Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-fd712f2413b91025.xml 2025-12-04T12:25:00.4820041Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2e275020a83607d9.xml 2025-12-04T12:25:00.5165930Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-32cb996256d67719.xml 2025-12-04T12:25:00.5516629Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-281110f64c593b33.xml 2025-12-04T12:25:00.5820084Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ab551cc6e4b8fc0e.xml 2025-12-04T12:25:00.6154665Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-bb4b38110c51be7b.xml 2025-12-04T12:25:00.6455314Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d76cceb106b5a87a.xml 2025-12-04T12:25:00.6797191Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f5087c7fb2c85ea4.xml 2025-12-04T12:25:00.7115367Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5bf92e22e16000ae.xml 2025-12-04T12:25:00.7477404Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a2df2e6eff7daa02.xml 2025-12-04T12:25:00.7955799Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-62cf8d48558e6611.xml 2025-12-04T12:25:00.8273054Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-008b4e727f5be082.xml 2025-12-04T12:25:00.8581012Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0b38d08cedf93968.xml 2025-12-04T12:25:00.8900905Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0615767c47cb824b.xml 2025-12-04T12:25:00.9217067Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3a85b82e41e52e7b.xml 2025-12-04T12:25:00.9564656Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-670c4eb9ad8ac35a.xml 2025-12-04T12:25:00.9886131Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1ae993f40739468a.xml 2025-12-04T12:25:01.0169713Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1379655e313056b3.xml 2025-12-04T12:25:01.0496585Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-17d32ccc8ec15e49.xml 2025-12-04T12:25:01.0814165Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c5afe3c6d472874.xml 2025-12-04T12:25:01.1117985Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-71d8c77dbd2b6cd3.xml 2025-12-04T12:25:01.1593562Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9e93da4b49ea34dc.xml 2025-12-04T12:25:01.1956773Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-09fe633d76933c88.xml 2025-12-04T12:25:01.2287189Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4db84368319deb77.xml 2025-12-04T12:25:01.2604569Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-867c58ec01067ba4.xml 2025-12-04T12:25:01.2923631Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f4ea20dbc7c23240.xml 2025-12-04T12:25:01.3275510Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-197b01c054eb8425.xml 2025-12-04T12:25:01.3610929Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5f78ef08e5f67618.xml 2025-12-04T12:25:01.3937949Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5dd09e666c5e73ac.xml 2025-12-04T12:25:01.4258729Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8d5b24102af3938b.xml 2025-12-04T12:25:01.4588569Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7ed88178415e82af.xml 2025-12-04T12:25:01.4923372Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-17ddadec6a584fc8.xml 2025-12-04T12:25:02.5381993Z Uploading artifacts took 0.97 seconds 2025-12-04T12:25:06.6694900Z Running test batch 'tests to run' cost 11917.7 seconds 2025-12-04T12:25:06.6701052Z Emitting td_test_failure_stats_v2 2025-12-04T12:25:06.6704361Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_44734394d10c11f08e600242ac110002 2025-12-04T12:25:06.7645302Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_44734394d10c11f08e600242ac110002 2025-12-04T12:25:06.7653655Z Emitting td_test_failure_stats_v2 2025-12-04T12:25:06.7654989Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_4481b262d10c11f08e600242ac110002 2025-12-04T12:25:06.8042925Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_4481b262d10c11f08e600242ac110002 2025-12-04T12:25:06.8049373Z Emitting td_test_failure_stats_v2 2025-12-04T12:25:06.8050279Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_4487bca2d10c11f08e600242ac110002 2025-12-04T12:25:06.8365591Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_4487bca2d10c11f08e600242ac110002 2025-12-04T12:25:06.8371485Z Emitting td_test_failure_stats_v2 2025-12-04T12:25:06.8372352Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_448caaa0d10c11f08e600242ac110002 2025-12-04T12:25:06.8713486Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_448caaa0d10c11f08e600242ac110002 2025-12-04T12:25:06.8717810Z Emitting td_test_failure_stats_v2 2025-12-04T12:25:06.8718765Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_4491f8c0d10c11f08e600242ac110002 2025-12-04T12:25:06.9043351Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_4491f8c0d10c11f08e600242ac110002 2025-12-04T12:25:06.9047753Z Emitting td_test_failure_stats_v2 2025-12-04T12:25:06.9048566Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_44970374d10c11f08e600242ac110002 2025-12-04T12:25:06.9323463Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764851106_44970374d10c11f08e600242ac110002 2025-12-04T12:25:06.9324509Z distributed/fsdp/test_fsdp_overlap 1/1 failed! 2025-12-04T12:25:06.9324967Z distributed/fsdp/test_fsdp_pure_fp16 1/1 failed! 2025-12-04T12:25:06.9325420Z distributed/fsdp/test_fsdp_exec_order 1/1 failed! 2025-12-04T12:25:06.9326014Z distributed/fsdp/test_hsdp_dtensor_state_dict 1/1 failed! 2025-12-04T12:25:06.9326614Z distributed/fsdp/test_fsdp_clip_grad_norm 1/1 failed! 2025-12-04T12:25:06.9327027Z distributed/fsdp/test_fsdp_core 2/2 failed! 2025-12-04T12:25:07.7495632Z 2025-12-04T12:25:07.7496127Z real 198m44.462s 2025-12-04T12:25:07.7496656Z user 425m27.491s 2025-12-04T12:25:07.7497098Z sys 243m29.884s 2025-12-04T12:25:07.7497383Z + sccache_epilogue 2025-12-04T12:25:07.7497710Z + echo '::group::Sccache Compilation Log' 2025-12-04T12:25:07.7498407Z ##[group]Sccache Compilation Log 2025-12-04T12:25:07.7498826Z + echo '=================== sccache compilation log ===================' 2025-12-04T12:25:07.7499312Z =================== sccache compilation log =================== 2025-12-04T12:25:07.7500387Z + python /var/lib/jenkins/workspace/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log 2025-12-04T12:25:07.7628802Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ===========' 2025-12-04T12:25:07.7629644Z =========== If your build fails, please take a look at the log above for possible reasons =========== 2025-12-04T12:25:07.7630219Z + sccache --show-stats 2025-12-04T12:25:07.7656695Z Compile requests 532 2025-12-04T12:25:07.7657192Z Compile requests executed 12 2025-12-04T12:25:07.7657743Z Cache hits 6 2025-12-04T12:25:07.7658120Z Cache hits (C/C++) 6 2025-12-04T12:25:07.7658459Z Cache misses 6 2025-12-04T12:25:07.7658811Z Cache misses (C/C++) 6 2025-12-04T12:25:07.7659173Z Cache hits rate 50.00 % 2025-12-04T12:25:07.7659546Z Cache hits rate (C/C++) 50.00 % 2025-12-04T12:25:07.7659910Z Cache timeouts 0 2025-12-04T12:25:07.7660274Z Cache read errors 0 2025-12-04T12:25:07.7660633Z Forced recaches 0 2025-12-04T12:25:07.7660973Z Cache write errors 0 2025-12-04T12:25:07.7661497Z Cache errors 0 2025-12-04T12:25:07.7661852Z Compilations 6 2025-12-04T12:25:07.7662215Z Compilation failures 0 2025-12-04T12:25:07.7662591Z Non-cacheable compilations 0 2025-12-04T12:25:07.7662951Z Non-cacheable calls 13 2025-12-04T12:25:07.7663318Z Non-compilation calls 507 2025-12-04T12:25:07.7663690Z Unsupported compiler calls 0 2025-12-04T12:25:07.7664068Z Average cache write 0.046 s 2025-12-04T12:25:07.7664434Z Average compiler 3.827 s 2025-12-04T12:25:07.7664809Z Average cache read hit 0.020 s 2025-12-04T12:25:07.7665197Z Failed distributed compilations 0 2025-12-04T12:25:07.7665452Z 2025-12-04T12:25:07.7665566Z Non-cacheable reasons: 2025-12-04T12:25:07.7665874Z -E 7 2025-12-04T12:25:07.7666246Z unknown source language 6 2025-12-04T12:25:07.7666486Z 2025-12-04T12:25:07.7666826Z Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-12-04T12:25:07.7667362Z Version (client) 0.10.0 2025-12-04T12:25:07.7667725Z + sccache --stop-server 2025-12-04T12:25:07.7681902Z Stopping sccache server... 2025-12-04T12:25:07.7682343Z Compile requests 532 2025-12-04T12:25:07.7682725Z Compile requests executed 12 2025-12-04T12:25:07.7683099Z Cache hits 6 2025-12-04T12:25:07.7683448Z Cache hits (C/C++) 6 2025-12-04T12:25:07.7683777Z Cache misses 6 2025-12-04T12:25:07.7684122Z Cache misses (C/C++) 6 2025-12-04T12:25:07.7684477Z Cache hits rate 50.00 % 2025-12-04T12:25:07.7684837Z Cache hits rate (C/C++) 50.00 % 2025-12-04T12:25:07.7685197Z Cache timeouts 0 2025-12-04T12:25:07.7685544Z Cache read errors 0 2025-12-04T12:25:07.7685874Z Forced recaches 0 2025-12-04T12:25:07.7686230Z Cache write errors 0 2025-12-04T12:25:07.7686576Z Cache errors 0 2025-12-04T12:25:07.7686908Z Compilations 6 2025-12-04T12:25:07.7687263Z Compilation failures 0 2025-12-04T12:25:07.7687627Z Non-cacheable compilations 0 2025-12-04T12:25:07.7687989Z Non-cacheable calls 13 2025-12-04T12:25:07.7688330Z Non-compilation calls 507 2025-12-04T12:25:07.7688698Z Unsupported compiler calls 0 2025-12-04T12:25:07.7689068Z Average cache write 0.046 s 2025-12-04T12:25:07.7689425Z Average compiler 3.827 s 2025-12-04T12:25:07.7689793Z Average cache read hit 0.020 s 2025-12-04T12:25:07.7690436Z Failed distributed compilations 0 2025-12-04T12:25:07.7690686Z 2025-12-04T12:25:07.7690797Z Non-cacheable reasons: 2025-12-04T12:25:07.7691095Z -E 7 2025-12-04T12:25:07.7691451Z unknown source language 6 2025-12-04T12:25:07.7691687Z 2025-12-04T12:25:07.7691961Z Cache location s3, name: ossci-compiler-cache-circleci-v2, prefix: / 2025-12-04T12:25:07.7692464Z Version (client) 0.10.0 2025-12-04T12:25:07.7692825Z + echo ::endgroup:: 2025-12-04T12:25:07.7693627Z ##[endgroup] 2025-12-04T12:25:07.7693888Z + cleanup_workspace 2025-12-04T12:25:07.7694723Z + echo 'sudo may print the following warning message that can be ignored. The chown command will still run.' 2025-12-04T12:25:07.7695906Z sudo may print the following warning message that can be ignored. The chown command will still run. 2025-12-04T12:25:07.7696875Z + echo ' sudo: setrlimit(RLIMIT_STACK): Operation not permitted' 2025-12-04T12:25:07.7697486Z sudo: setrlimit(RLIMIT_STACK): Operation not permitted 2025-12-04T12:25:07.7698278Z + echo 'For more details refer to https://github.com/sudo-project/sudo/issues/42' 2025-12-04T12:25:07.7699245Z For more details refer to https://github.com/sudo-project/sudo/issues/42 2025-12-04T12:25:07.7700100Z + sudo chown -R 1000 /var/lib/jenkins/workspace 2025-12-04T12:25:08.4372705Z ##[error]Process completed with exit code 1. 2025-12-04T12:25:08.4446240Z Prepare all required actions 2025-12-04T12:25:08.4446654Z Getting action download info 2025-12-04T12:25:08.6295324Z ##[group]Run ./.github/actions/pytest-cache-upload 2025-12-04T12:25:08.6296017Z with: 2025-12-04T12:25:08.6296255Z cache_dir: .pytest_cache 2025-12-04T12:25:08.6296617Z shard: 3 2025-12-04T12:25:08.6297052Z sha: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T12:25:08.6297459Z test_config: distributed 2025-12-04T12:25:08.6297858Z job_identifier: trunk_linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T12:25:08.6298289Z env: 2025-12-04T12:25:08.6298521Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:08.6298824Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:08.6299190Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:08.6299821Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:08.6300406Z ##[endgroup] 2025-12-04T12:25:08.6336515Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T12:25:08.6337047Z with: 2025-12-04T12:25:08.6337269Z shell: bash 2025-12-04T12:25:08.6337528Z timeout_minutes: 5 2025-12-04T12:25:08.6337816Z max_attempts: 5 2025-12-04T12:25:08.6338077Z retry_wait_seconds: 30 2025-12-04T12:25:08.6338469Z command: set -eu python3 -m pip install boto3==1.35.42 2025-12-04T12:25:08.6338912Z polling_interval_seconds: 1 2025-12-04T12:25:08.6339237Z warning_on_retry: true 2025-12-04T12:25:08.6339524Z continue_on_error: false 2025-12-04T12:25:08.6339812Z env: 2025-12-04T12:25:08.6340054Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:08.6340353Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:08.6340714Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:08.6341370Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:08.6341939Z ##[endgroup] 2025-12-04T12:25:08.9873936Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T12:25:10.1962623Z Collecting boto3==1.35.42 2025-12-04T12:25:10.2132753Z Downloading boto3-1.35.42-py3-none-any.whl (139 kB) 2025-12-04T12:25:11.5088208Z Collecting botocore<1.36.0,>=1.35.42 2025-12-04T12:25:11.5131295Z Downloading botocore-1.35.99-py3-none-any.whl (13.3 MB) 2025-12-04T12:25:11.7100541Z Collecting s3transfer<0.11.0,>=0.10.0 2025-12-04T12:25:11.7140500Z Downloading s3transfer-0.10.4-py3-none-any.whl (83 kB) 2025-12-04T12:25:11.7185849Z Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/lib/python3.9/site-packages (from boto3==1.35.42) (0.10.0) 2025-12-04T12:25:11.7248279Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (2.8.1) 2025-12-04T12:25:11.7255184Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.25.10) 2025-12-04T12:25:11.8746471Z Requirement already satisfied: six>=1.5 in /usr/lib/python3.9/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.15.0) 2025-12-04T12:25:11.9714826Z Installing collected packages: botocore, s3transfer, boto3 2025-12-04T12:25:12.5525174Z Successfully installed boto3-1.35.42 botocore-1.35.99 s3transfer-0.10.4 2025-12-04T12:25:12.7182934Z Command completed after 1 attempt(s). 2025-12-04T12:25:12.7239372Z ##[group]Run python3 .github/scripts/pytest_cache.py \ 2025-12-04T12:25:12.7239843Z python3 .github/scripts/pytest_cache.py \ 2025-12-04T12:25:12.7240204Z  --upload \ 2025-12-04T12:25:12.7240531Z  --cache_dir "$GITHUB_WORKSPACE/$CACHE_DIR" \ 2025-12-04T12:25:12.7240947Z  --pr_identifier "$GITHUB_REF" \ 2025-12-04T12:25:12.7241309Z  --job_identifier "$JOB_IDENTIFIER" \ 2025-12-04T12:25:12.7241656Z  --sha "$SHA" \ 2025-12-04T12:25:12.7242031Z  --test_config "$TEST_CONFIG" \ 2025-12-04T12:25:12.7242366Z  --shard "$SHARD" \ 2025-12-04T12:25:12.7242645Z  --repo "$REPO" \ 2025-12-04T12:25:12.7243083Z  --temp_dir "$RUNNER_TEMP" \ 2025-12-04T12:25:12.7252501Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:25:12.7252884Z env: 2025-12-04T12:25:12.7253108Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:12.7253388Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:12.7253702Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:12.7254284Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:12.7254819Z CACHE_DIR: .pytest_cache 2025-12-04T12:25:12.7255183Z JOB_IDENTIFIER: trunk_linux-jammy-cuda12.8-py3.10-gcc11 2025-12-04T12:25:12.7255597Z SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T12:25:12.7255950Z TEST_CONFIG: distributed 2025-12-04T12:25:12.7256218Z SHARD: 3 2025-12-04T12:25:12.7256567Z REPO: pytorch/pytorch 2025-12-04T12:25:12.7257007Z ##[endgroup] 2025-12-04T12:25:13.0829464Z PR identifier for `refs/heads/main` is `96e092540d6b3c4076e3d2bc6f1f9013` 2025-12-04T12:25:13.0831950Z Uploading cache with args Namespace(upload=True, download=False, cache_dir='/home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache', pr_identifier='refs/heads/main', job_identifier='trunk_linux-jammy-cuda12.8-py3.10-gcc11', sha='ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32', test_config='distributed', shard='3', repo='pytorch/pytorch', temp_dir='/home/ec2-user/actions-runner/_work/_temp', bucket=None) 2025-12-04T12:25:13.0834206Z Zipping /home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache 2025-12-04T12:25:13.0835657Z to /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/trunk_linux-jammy-cuda12_8-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/distributed/3 2025-12-04T12:25:13.0837990Z Uploading /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/trunk_linux-jammy-cuda12_8-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/distributed/3.zip 2025-12-04T12:25:13.0840039Z to s3://gha-artifacts/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/trunk_linux-jammy-cuda12_8-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/distributed/3.zip 2025-12-04T12:25:13.1315360Z ##[group]Run cat test/**/*_toprint.log || true 2025-12-04T12:25:13.1315774Z cat test/**/*_toprint.log || true 2025-12-04T12:25:13.1322286Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:25:13.1322715Z env: 2025-12-04T12:25:13.1322965Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:13.1323394Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:13.1323765Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:13.1324422Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:13.1325015Z ##[endgroup] 2025-12-04T12:25:13.1419016Z cat: 'test/**/*_toprint.log': No such file or directory 2025-12-04T12:25:13.1447643Z ##[group]Run kill "$MONITOR_SCRIPT_PID" 2025-12-04T12:25:13.1448034Z kill "$MONITOR_SCRIPT_PID" 2025-12-04T12:25:13.1453718Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:25:13.1454093Z env: 2025-12-04T12:25:13.1454316Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:13.1454592Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:13.1454903Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:13.1455477Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:13.1456009Z MONITOR_SCRIPT_PID: 62844 2025-12-04T12:25:13.1456299Z ##[endgroup] 2025-12-04T12:25:13.1480541Z /home/ec2-user/actions-runner/_work/_temp/15c09898-b395-4ad6-b513-93226678e011.sh: line 1: kill: (62844) - No such process 2025-12-04T12:25:13.1483199Z ##[error]Process completed with exit code 1. 2025-12-04T12:25:13.1621408Z Prepare all required actions 2025-12-04T12:25:13.1621898Z Getting action download info 2025-12-04T12:25:13.3454635Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-12-04T12:25:13.5737919Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-12-04T12:25:14.0760139Z ##[group]Run ./.github/actions/upload-test-artifacts 2025-12-04T12:25:14.0760517Z with: 2025-12-04T12:25:14.0760933Z file-suffix: test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904 2025-12-04T12:25:14.0761443Z s3-bucket: gha-artifacts 2025-12-04T12:25:14.0761709Z env: 2025-12-04T12:25:14.0761930Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:14.0762202Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:14.0762531Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:14.0763111Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:14.0763665Z ##[endgroup] 2025-12-04T12:25:14.0789934Z ##[group]Run # Remove any previous test jsons if they exist 2025-12-04T12:25:14.0790448Z # Remove any previous test jsons if they exist 2025-12-04T12:25:14.0790849Z rm -f test-jsons-*.zip 2025-12-04T12:25:14.0791328Z zip -r "test-jsons-${FILE_SUFFIX}.zip" test/test-reports -i '*.json' 2025-12-04T12:25:14.0797244Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:25:14.0797631Z env: 2025-12-04T12:25:14.0797844Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:14.0798121Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:14.0798448Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:14.0799015Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:14.0799740Z FILE_SUFFIX: test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904 2025-12-04T12:25:14.0800241Z ##[endgroup] 2025-12-04T12:25:14.1038537Z adding: test/test-reports/td_exclusions-2f1c2264a3249442bd0a.json (deflated 86%) 2025-12-04T12:25:14.1039727Z adding: test/test-reports/python-pytest/distributed.test_c10d_functional_native/distributed.test_c10d_functional_native-369cc3de9e188dd1.json (deflated 92%) 2025-12-04T12:25:14.1041220Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-39c8c10a0ef1a34e.json (deflated 79%) 2025-12-04T12:25:14.1042646Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-bb36a88bac557029.json (deflated 79%) 2025-12-04T12:25:14.1044081Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-9b6f6e417d9b4600.json (deflated 79%) 2025-12-04T12:25:14.1045651Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-83c25fe932c36613.json (stored 0%) 2025-12-04T12:25:14.1047079Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-e1278d34de852f2a.json (deflated 79%) 2025-12-04T12:25:14.1048524Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-efcb608498b7750d.json (deflated 79%) 2025-12-04T12:25:14.1049962Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-9a300aee582fd0b6.json (deflated 79%) 2025-12-04T12:25:14.1051389Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-433868368b6a29b3.json (deflated 79%) 2025-12-04T12:25:14.1052819Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-cb48c540b8fb2acf.json (deflated 87%) 2025-12-04T12:25:14.1054254Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-f306b72badd85355.json (deflated 79%) 2025-12-04T12:25:14.1055884Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-456a3faf0e1ca4c4.json (stored 0%) 2025-12-04T12:25:14.1057682Z adding: test/test-reports/python-pytest/distributed.tensor.debug.test_debug_mode/distributed.tensor.debug.test_debug_mode-21dd2989918f2f32.json (deflated 90%) 2025-12-04T12:25:14.1059242Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-93c7f0a0a61745d5.json (deflated 79%) 2025-12-04T12:25:14.1060757Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-50fd36707db41f77.json (deflated 79%) 2025-12-04T12:25:14.1062255Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-434f2a168fab2502.json (deflated 79%) 2025-12-04T12:25:14.1063757Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-810575b51f00acc3.json (deflated 79%) 2025-12-04T12:25:14.1065257Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-acd65444fa26961a.json (deflated 79%) 2025-12-04T12:25:14.1066762Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d7f6d912312cc834.json (deflated 79%) 2025-12-04T12:25:14.1068287Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d3fa58c4cf34965f.json (deflated 80%) 2025-12-04T12:25:14.1069864Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d5b8ecd9108f02ac.json (deflated 88%) 2025-12-04T12:25:14.1071329Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-578e4c4077b7a803.json (deflated 80%) 2025-12-04T12:25:14.1072787Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14d4a314808f55fe.json (deflated 80%) 2025-12-04T12:25:14.1074248Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-72b90a4f7545df10.json (deflated 80%) 2025-12-04T12:25:14.1075704Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cc094df1219cfd82.json (deflated 91%) 2025-12-04T12:25:14.1077151Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-94627d53ab92538d.json (deflated 80%) 2025-12-04T12:25:14.1078620Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f49c40cee39994b2.json (deflated 80%) 2025-12-04T12:25:14.1080133Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-a8869f6ed51873ac.json (deflated 80%) 2025-12-04T12:25:14.1081597Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-90a4ba7c1fd04d10.json (deflated 80%) 2025-12-04T12:25:14.1083072Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ccaa5b3b6bf09af7.json (deflated 80%) 2025-12-04T12:25:14.1084526Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ca39f8152ef39349.json (deflated 91%) 2025-12-04T12:25:14.1085989Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-7178045a44a28781.json (deflated 79%) 2025-12-04T12:25:14.1087452Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cdb7b80b8b392fad.json (deflated 79%) 2025-12-04T12:25:14.1088993Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-9595731043617943.json (deflated 87%) 2025-12-04T12:25:14.1090476Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f8bd87b046fcc0d3.json (deflated 79%) 2025-12-04T12:25:14.1091938Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-68dc7893385d1617.json (deflated 79%) 2025-12-04T12:25:14.1093402Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14f8a536ecccf07e.json (deflated 79%) 2025-12-04T12:25:14.1094845Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-77e61ff77a3b19cd.json (stored 0%) 2025-12-04T12:25:14.1096461Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a78dec0d79621f36.json (deflated 80%) 2025-12-04T12:25:14.1098279Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9a14ac4718e66e44.json (deflated 80%) 2025-12-04T12:25:14.1099938Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7d115d367e840460.json (deflated 80%) 2025-12-04T12:25:14.1101611Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-724e16d7d24ec18b.json (deflated 80%) 2025-12-04T12:25:14.1103270Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-1c81c8f34feb9c16.json (deflated 80%) 2025-12-04T12:25:14.1104926Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a326f09bb7c5e616.json (deflated 80%) 2025-12-04T12:25:14.1106594Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7096ae518bc839e.json (deflated 80%) 2025-12-04T12:25:14.1108266Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-dbe06a751e4355d9.json (deflated 80%) 2025-12-04T12:25:14.1109997Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7f21dedd43754e1.json (deflated 80%) 2025-12-04T12:25:14.1111631Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7dbc99509eb0f4ce.json (deflated 80%) 2025-12-04T12:25:14.1113287Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-5b4af92028672eb6.json (deflated 80%) 2025-12-04T12:25:14.1114885Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c67b11ef8bde4252.json (deflated 80%) 2025-12-04T12:25:14.1116508Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c057f5798619892b.json (deflated 80%) 2025-12-04T12:25:14.1118120Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-aae1a2ba6806c0ef.json (deflated 80%) 2025-12-04T12:25:14.1119732Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c34ce2d8050066e8.json (deflated 80%) 2025-12-04T12:25:14.1121716Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-fde5b3ce12e5a98a.json (deflated 88%) 2025-12-04T12:25:14.1123492Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-b1cbedcab1229122.json (deflated 80%) 2025-12-04T12:25:14.1125221Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-6d24496891daae4f.json (deflated 80%) 2025-12-04T12:25:14.1126880Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e815db3b6b0b67f1.json (deflated 87%) 2025-12-04T12:25:14.1128533Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-788cdb9001b436df.json (deflated 79%) 2025-12-04T12:25:14.1130179Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9601a812ff315158.json (deflated 79%) 2025-12-04T12:25:14.1131849Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c4b6ce2b260b8d4b.json (deflated 79%) 2025-12-04T12:25:14.1133631Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-490a12d48ec816b9.json (deflated 79%) 2025-12-04T12:25:14.1135241Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e2f9fc6fa3a79028.json (deflated 79%) 2025-12-04T12:25:14.1137083Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-384ab9a5685ff7be.json (stored 0%) 2025-12-04T12:25:14.1138683Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a06a4188d644524d.json (deflated 87%) 2025-12-04T12:25:14.1140268Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-03186403898f3bbb.json (deflated 87%) 2025-12-04T12:25:14.1141838Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a3dc994784795bc1.json (deflated 79%) 2025-12-04T12:25:14.1143407Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-b1d6139c1033a518.json (deflated 79%) 2025-12-04T12:25:14.1144980Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-ebdc3db326996caa.json (deflated 79%) 2025-12-04T12:25:14.1146537Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-c42bc725a7562377.json (deflated 79%) 2025-12-04T12:25:14.1148158Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-4818210284e31d5e.json (deflated 79%) 2025-12-04T12:25:14.1149813Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-1b5186457c75b3fb.json (deflated 87%) 2025-12-04T12:25:14.1151335Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-74e02afb5846363a.json (deflated 79%) 2025-12-04T12:25:14.1152846Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-39202840e4782b07.json (deflated 90%) 2025-12-04T12:25:14.1154365Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-067163aa862fde85.json (deflated 91%) 2025-12-04T12:25:14.1155899Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-adf2403f35f3c235.json (deflated 90%) 2025-12-04T12:25:14.1157466Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-36b91fd354097cab.json (stored 0%) 2025-12-04T12:25:14.1158928Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90a070d9a0caeaa7.json (deflated 79%) 2025-12-04T12:25:14.1160279Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b56b818e7dab969.json (deflated 87%) 2025-12-04T12:25:14.1161639Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2da5f79ab7711605.json (deflated 87%) 2025-12-04T12:25:14.1162997Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a202ac92fafcf85d.json (deflated 79%) 2025-12-04T12:25:14.1164367Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bacdfd4e137b31c0.json (deflated 87%) 2025-12-04T12:25:14.1165720Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2f84fddbafa0e0f3.json (deflated 79%) 2025-12-04T12:25:14.1167079Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8511307d41418b77.json (deflated 79%) 2025-12-04T12:25:14.1168440Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3768a5b2a44119fc.json (deflated 79%) 2025-12-04T12:25:14.1169818Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-31ee953fde08a139.json (deflated 80%) 2025-12-04T12:25:14.1171177Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cf0a0887fe85c292.json (deflated 79%) 2025-12-04T12:25:14.1172536Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-07c27c95d6f3d3d6.json (deflated 79%) 2025-12-04T12:25:14.1173902Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ec3b2535e8e2ad7.json (deflated 79%) 2025-12-04T12:25:14.1175261Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c7bc1bec56d6360.json (deflated 87%) 2025-12-04T12:25:14.1176688Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1003ee713f2c1e3e.json (deflated 79%) 2025-12-04T12:25:14.1178244Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-86ef8482fc5a0e9d.json (deflated 79%) 2025-12-04T12:25:14.1179650Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e9238188d8477a2.json (deflated 87%) 2025-12-04T12:25:14.1181077Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9476e56094f0b738.json (deflated 79%) 2025-12-04T12:25:14.1182474Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-207ff9590d724b3a.json (deflated 79%) 2025-12-04T12:25:14.1183872Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f664e87214ff2805.json (deflated 79%) 2025-12-04T12:25:14.1185273Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-def950b7d24ceea9.json (deflated 79%) 2025-12-04T12:25:14.1186683Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-89dfbd7b5cd71317.json (deflated 79%) 2025-12-04T12:25:14.1188088Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bdae057bafb686b9.json (deflated 87%) 2025-12-04T12:25:14.1189644Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-eb4953947b5f3ef2.json (deflated 79%) 2025-12-04T12:25:14.1191021Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-532f83d54e2054ff.json (deflated 91%) 2025-12-04T12:25:14.1192380Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3483d762b5b4fca1.json (deflated 79%) 2025-12-04T12:25:14.1193739Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c6b2032ef8ff1e94.json (deflated 87%) 2025-12-04T12:25:14.1195100Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5647de3303d26f02.json (deflated 79%) 2025-12-04T12:25:14.1196439Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cff7e7504b276d84.json (deflated 87%) 2025-12-04T12:25:14.1197807Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d2fb83ab3ccdeb6.json (deflated 79%) 2025-12-04T12:25:14.1199179Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bd911142cc34300e.json (deflated 91%) 2025-12-04T12:25:14.1200530Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8e84025a0dc7a16.json (deflated 91%) 2025-12-04T12:25:14.1201886Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-392d2e7951c1c5f3.json (deflated 79%) 2025-12-04T12:25:14.1203230Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-477ee10c9167da98.json (deflated 91%) 2025-12-04T12:25:14.1204588Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-96eeb012f5f596ba.json (deflated 79%) 2025-12-04T12:25:14.1205958Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc37fd9d84da442a.json (deflated 79%) 2025-12-04T12:25:14.1207325Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cbd7e5f481e859be.json (deflated 79%) 2025-12-04T12:25:14.1208674Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ede249f1a681285.json (deflated 79%) 2025-12-04T12:25:14.1210036Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-11be05c94e086d26.json (deflated 79%) 2025-12-04T12:25:14.1211396Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-16966e8ed8e62900.json (deflated 79%) 2025-12-04T12:25:14.1212755Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90420efea6f00dc5.json (deflated 79%) 2025-12-04T12:25:14.1214477Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6c9f36ab2b8b15ae.json (deflated 79%) 2025-12-04T12:25:14.1215847Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d4c1fd96adc2be7.json (deflated 87%) 2025-12-04T12:25:14.1217459Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-500277f28031837e.json (deflated 79%) 2025-12-04T12:25:14.1218856Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-942d56c07e16c88d.json (deflated 79%) 2025-12-04T12:25:14.1220261Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-55fdf9ad8e0a27f0.json (deflated 79%) 2025-12-04T12:25:14.1221819Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e1cdaa245647d1a.json (deflated 79%) 2025-12-04T12:25:14.1223342Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a996648fbbff19f5.json (deflated 79%) 2025-12-04T12:25:14.1224799Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc1573489c80017b.json (deflated 79%) 2025-12-04T12:25:14.1226193Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4d2b72d464b1c339.json (deflated 80%) 2025-12-04T12:25:14.1227587Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-65dbafa4918c0ef1.json (deflated 80%) 2025-12-04T12:25:14.1228990Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5b8e1f7dea233320.json (deflated 91%) 2025-12-04T12:25:14.1230386Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d13641fc6f0b57c.json (deflated 80%) 2025-12-04T12:25:14.1231789Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-29e66d82c97dbaa5.json (deflated 80%) 2025-12-04T12:25:14.1233283Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a798bbedf3e7b999.json (deflated 91%) 2025-12-04T12:25:14.1234636Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e0d5d8a174cb3c98.json (deflated 88%) 2025-12-04T12:25:14.1235991Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-931d013fb4c2579a.json (deflated 91%) 2025-12-04T12:25:14.1237341Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-92646f491493cae0.json (deflated 80%) 2025-12-04T12:25:14.1238684Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8232c23afc6466e0.json (deflated 79%) 2025-12-04T12:25:14.1240032Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-983af60bcd722f1d.json (deflated 79%) 2025-12-04T12:25:14.1241395Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-84ede3fbd174dfda.json (deflated 79%) 2025-12-04T12:25:14.1242757Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9538bfd24f807d16.json (deflated 87%) 2025-12-04T12:25:14.1244121Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e7d2c56cd2be4bb.json (deflated 88%) 2025-12-04T12:25:14.1245471Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1378f62336ac1630.json (deflated 79%) 2025-12-04T12:25:14.1246874Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8e092965a6aa7362.json (deflated 87%) 2025-12-04T12:25:14.1248232Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-19aef0a0802c58a7.json (deflated 79%) 2025-12-04T12:25:14.1249591Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5e8c70689f4db333.json (deflated 87%) 2025-12-04T12:25:14.1250935Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-389219a70e101b44.json (deflated 79%) 2025-12-04T12:25:14.1252290Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22aad73f608511a0.json (deflated 87%) 2025-12-04T12:25:14.1253643Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22bb81621d944803.json (deflated 79%) 2025-12-04T12:25:14.1254994Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e70588b2995dc7c5.json (deflated 79%) 2025-12-04T12:25:14.1256502Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b456a18c8ca9135a.json (deflated 79%) 2025-12-04T12:25:14.1258218Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aedba904eee3ba73.json (deflated 79%) 2025-12-04T12:25:14.1259617Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2d3d36f137cb39b5.json (deflated 79%) 2025-12-04T12:25:14.1261005Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-973a0dc84b27de93.json (deflated 79%) 2025-12-04T12:25:14.1262409Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e9342b39aaf3792.json (deflated 79%) 2025-12-04T12:25:14.1263813Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-15b775a41cf5a439.json (deflated 79%) 2025-12-04T12:25:14.1265227Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-56374ffd8bd068de.json (deflated 79%) 2025-12-04T12:25:14.1266612Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6288913bb010f746.json (deflated 79%) 2025-12-04T12:25:14.1268008Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d2350a2a3a63f23.json (deflated 79%) 2025-12-04T12:25:14.1269501Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ee9779088060e0f5.json (deflated 87%) 2025-12-04T12:25:14.1270856Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7a7aa8c4ec058e09.json (deflated 79%) 2025-12-04T12:25:14.1272208Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4f45a35aeec028b0.json (stored 0%) 2025-12-04T12:25:14.1273580Z adding: test/test-reports/python-pytest/distributed.algorithms.test_join/distributed.algorithms.test_join-346fdf8ca2d8d04c.json (deflated 89%) 2025-12-04T12:25:14.1275162Z adding: test/test-reports/python-pytest/distributed.pipelining.test_schedule_multiproc/distributed.pipelining.test_schedule_multiproc-4c892aab54fe07b4.json (deflated 94%) 2025-12-04T12:25:14.1276802Z adding: test/test-reports/python-pytest/distributed.test_compute_comm_reordering/distributed.test_compute_comm_reordering-5eeb11f30d43fbd8.json (deflated 87%) 2025-12-04T12:25:14.1278253Z adding: test/test-reports/python-pytest/distributed.test_cupy_as_tensor/distributed.test_cupy_as_tensor-9bf0be6a7af397ad.json (deflated 47%) 2025-12-04T12:25:14.1279580Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fx/distributed.fsdp.test_fsdp_fx-d8b89ec57f22953e.json (deflated 35%) 2025-12-04T12:25:14.1280948Z adding: test/test-reports/python-pytest/distributed._tools.test_sac_ilp/distributed._tools.test_sac_ilp-80280b96b0e30cba.json (deflated 76%) 2025-12-04T12:25:14.1282387Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_hf_storage/distributed.checkpoint.test_hf_storage-5c05eca826b12737.json (deflated 83%) 2025-12-04T12:25:14.1283933Z adding: test/test-reports/python-pytest/distributed.pipelining.test_microbatch/distributed.pipelining.test_microbatch-db2f7f262044cd4d.json (deflated 68%) 2025-12-04T12:25:14.1285476Z adding: test/test-reports/python-pytest/distributed.tensor.test_placement_types/distributed.tensor.test_placement_types-aa6a82bf337fac31.json (deflated 82%) 2025-12-04T12:25:14.1287116Z adding: test/test-reports/python-pytest/distributed.tensor.test_dtensor_dispatch_overhead/distributed.tensor.test_dtensor_dispatch_overhead-1be227e0f3a4b8ca.json (deflated 42%) 2025-12-04T12:25:14.1288985Z adding: test/test-reports/python-pytest/distributed.checkpoint._experimental.test_checkpoint_reader/distributed.checkpoint._experimental.test_checkpoint_reader-e75c494c472cf9a1.json (deflated 78%) 2025-12-04T12:25:14.1290835Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_format_utils/distributed.checkpoint.test_format_utils-ff4efe8ffc0a39b9.json (deflated 74%) 2025-12-04T12:25:14.1292476Z adding: test/test-reports/python-pytest/distributed.test_aten_comm_compute_reordering/distributed.test_aten_comm_compute_reordering-8ab49fa352932ba1.json (deflated 91%) 2025-12-04T12:25:14.1294032Z adding: test/test-reports/python-pytest/distributed.tensor.test_redistribute/distributed.tensor.test_redistribute-02b614c0805e2900.json (deflated 92%) 2025-12-04T12:25:14.1295569Z adding: test/test-reports/python-pytest/distributed.tensor.parallel.test_tp_style/distributed.tensor.parallel.test_tp_style-3daa17d4beb2059f.json (deflated 90%) 2025-12-04T12:25:14.1297255Z adding: test/test-reports/python-pytest/distributed.tensor.test_api/distributed.tensor.test_api-143a55cc9757e18a.json (deflated 90%) 2025-12-04T12:25:14.1298696Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_fsspec/distributed.checkpoint.test_fsspec-2295d11b632387c0.json (deflated 69%) 2025-12-04T12:25:14.1300380Z adding: test/test-reports/python-pytest/distributed.tensor.experimental.test_tp_transform/distributed.tensor.experimental.test_tp_transform-af912528cabb656d.json (deflated 76%) 2025-12-04T12:25:14.1302070Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_traverse/distributed.checkpoint.test_traverse-f038bc92a00bd1c7.json (deflated 87%) 2025-12-04T12:25:14.1303583Z adding: test/test-reports/python-pytest/distributed.tensor.test_random_ops/distributed.tensor.test_random_ops-a8f6b522aa6434af.json (deflated 92%) 2025-12-04T12:25:14.1305224Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_logging/distributed._composable.fsdp.test_fully_shard_logging-7e09cae3d59aa65e.json (stored 0%) 2025-12-04T12:25:14.1306839Z adding: test/test-reports/python-pytest/distributed.launcher.test_api/distributed.launcher.test_api-15b87ceaa10651c5.json (deflated 63%) 2025-12-04T12:25:14.1308420Z adding: test/test-reports/python-pytest/distributed.elastic.multiprocessing.test_api/distributed.elastic.multiprocessing.test_api-12b95803d8942f3a.json (deflated 86%) 2025-12-04T12:25:14.1310110Z adding: test/test-reports/python-pytest/distributed.fsdp.test_shard_utils/distributed.fsdp.test_shard_utils-76ee73cffd398e77.json (deflated 64%) 2025-12-04T12:25:14.1311635Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_fsdp_optim_state/distributed.checkpoint.test_fsdp_optim_state-f29e492ac7e0fdff.json (deflated 66%) 2025-12-04T12:25:14.1313310Z adding: test/test-reports/python-pytest/distributed.checkpoint.e2e.test_e2e_save_and_load/distributed.checkpoint.e2e.test_e2e_save_and_load-ea436a2b3918b4b7.json (deflated 92%) 2025-12-04T12:25:14.1315018Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_dtensor_resharding/distributed.checkpoint.test_dtensor_resharding-850e82d898db0167.json (deflated 89%) 2025-12-04T12:25:14.1316601Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_memory/distributed.fsdp.test_fsdp_memory-bd1d93d0f6b45624.json (deflated 66%) 2025-12-04T12:25:14.1318047Z adding: test/test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-8ffd5e5eb5f5ad7d.json (deflated 91%) 2025-12-04T12:25:14.1319608Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_compatibility/distributed.checkpoint.test_compatibility-759684b03ee5bd2d.json (deflated 80%) 2025-12-04T12:25:14.1321446Z adding: test/test-reports/python-pytest/distributed._tools.test_mem_tracker/distributed._tools.test_mem_tracker-e6bb23aea30c734a.json (deflated 73%) 2025-12-04T12:25:14.1322948Z adding: test/test-reports/python-pytest/distributed.elastic.test_control_plane/distributed.elastic.test_control_plane-8adada293373a225.json (deflated 85%) 2025-12-04T12:25:14.1324359Z adding: test/test-reports/python-pytest/distributed.test_fake_pg/distributed.test_fake_pg-79e3fe3f86c7485d.json (deflated 91%) 2025-12-04T12:25:14.1325944Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_fsdp_model_state/distributed.checkpoint.test_fsdp_model_state-d2d7dab49696755b.json (deflated 67%) 2025-12-04T12:25:14.1327541Z adding: test/test-reports/python-pytest/distributed.test_functional_api/distributed.test_functional_api-d3092064f68d2f41.json (deflated 88%) 2025-12-04T12:25:14.1329237Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_clip_grad_norm_/distributed._composable.fsdp.test_fully_shard_clip_grad_norm_-2322cac9c0cc490f.json (deflated 67%) 2025-12-04T12:25:14.1331004Z adding: test/test-reports/python-pytest/distributed.tensor.debug.test_comm_mode/distributed.tensor.debug.test_comm_mode-8cc829f047ed6143.json (deflated 73%) 2025-12-04T12:25:14.1332419Z adding: test/test-reports/python-pytest/distributed.test_dist2/distributed.test_dist2-7a48db8512284abb.json (deflated 93%) 2025-12-04T12:25:14.1334097Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_grad_scaler/distributed._composable.fsdp.test_fully_shard_grad_scaler-5e3c33eaf29838b0.json (deflated 41%) 2025-12-04T12:25:14.1335708Z adding: test/test-reports/python-pytest/distributed.launcher.test_run/distributed.launcher.test_run-eeaaeb50473e3b00.json (deflated 92%) 2025-12-04T12:25:14.1337444Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_backward_prefetch/distributed.fsdp.test_fsdp_backward_prefetch-9d6c65a3bd838e6b.json (deflated 42%) 2025-12-04T12:25:14.1339081Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_checkpoint/distributed.checkpoint.test_checkpoint-698955a0be6378e2.json (deflated 87%) 2025-12-04T12:25:14.1340608Z adding: test/test-reports/python-pytest/distributed._pycute.test_coalesce/distributed._pycute.test_coalesce-d2727b6d77166552.json (deflated 38%) 2025-12-04T12:25:14.1342080Z adding: test/test-reports/python-pytest/distributed._pycute.test_complement/distributed._pycute.test_complement-323506218bd25d4f.json (deflated 40%) 2025-12-04T12:25:14.1343596Z adding: test/test-reports/python-pytest/distributed._pycute.test_composition/distributed._pycute.test_composition-91e42d2ac7610498.json (deflated 40%) 2025-12-04T12:25:14.1345062Z adding: test/test-reports/python-pytest/distributed._pycute.test_int_tuple/distributed._pycute.test_int_tuple-1604350619512e65.json (deflated 91%) 2025-12-04T12:25:14.1346538Z adding: test/test-reports/python-pytest/distributed._pycute.test_left_inverse/distributed._pycute.test_left_inverse-7b550f03a54828f5.json (deflated 39%) 2025-12-04T12:25:14.1348053Z adding: test/test-reports/python-pytest/distributed._pycute.test_right_inverse/distributed._pycute.test_right_inverse-5437f0847845b913.json (deflated 40%) 2025-12-04T12:25:14.1349703Z adding: test/test-reports/python-pytest/distributed._composable.test_replicate/distributed._composable.test_replicate-5594e5fd77ce79b5.json (deflated 92%) 2025-12-04T12:25:14.1351333Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_hsdp_checkpoint/distributed.checkpoint.test_hsdp_checkpoint-293bcc74b378a9a0.json (deflated 81%) 2025-12-04T12:25:14.1353041Z adding: test/test-reports/python-pytest/distributed.tensor.parallel.test_parallelize_api/distributed.tensor.parallel.test_parallelize_api-e24bc2790e3eed77.json (deflated 94%) 2025-12-04T12:25:14.1354650Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_state_dict/distributed.fsdp.test_fsdp_state_dict-3c13b82ce7076bc1.json (deflated 97%) 2025-12-04T12:25:14.1356058Z adding: test/test-reports/python-pytest/distributed._pycute.test_typing/distributed._pycute.test_typing-1c9aabc95fed14a1.json (deflated 38%) 2025-12-04T12:25:14.1357442Z adding: test/test-reports/python-pytest/distributed.test_serialization/distributed.test_serialization-5c3790edbaae9c6a.json (deflated 82%) 2025-12-04T12:25:14.1358904Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_ignored_modules/distributed.fsdp.test_fsdp_ignored_modules-c4ab0979e06883a2.json (deflated 88%) 2025-12-04T12:25:14.1360604Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_comm/distributed._composable.fsdp.test_fully_shard_comm-b03b971b17f9f8be.json (deflated 90%) 2025-12-04T12:25:14.1362307Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_sharded_grad_scaler/distributed.fsdp.test_fsdp_sharded_grad_scaler-830facc45336217a.json (deflated 94%) 2025-12-04T12:25:14.1363979Z adding: test/test-reports/python-pytest/distributed._shard.sharding_plan.test_sharding_plan/distributed._shard.sharding_plan.test_sharding_plan-86fe0d16a378ac71.json (deflated 76%) 2025-12-04T12:25:14.1365716Z adding: test/test-reports/python-pytest/distributed._shard.sharded_optim.test_sharded_optim/distributed._shard.sharded_optim.test_sharded_optim-a8d576a6cb5a21e5.json (deflated 67%) 2025-12-04T12:25:14.1367481Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_state_dict/distributed._composable.fsdp.test_fully_shard_state_dict-7cd1746803ec2a8b.json (deflated 87%) 2025-12-04T12:25:14.1369064Z adding: test/test-reports/python-pytest/distributed.tensor.test_utils/distributed.tensor.test_utils-ce4dc3e67348c080.json (deflated 90%) 2025-12-04T12:25:14.1370621Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_memory/distributed._composable.fsdp.test_fully_shard_memory-bd84ca434b9abee9.json (deflated 67%) 2025-12-04T12:25:14.1372273Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_state_dict/distributed.checkpoint.test_state_dict-82ab38e24fe889c8.json (deflated 92%) 2025-12-04T12:25:14.1373839Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_state_dict_utils/distributed.checkpoint.test_state_dict_utils-a19642af8d31d778.json (deflated 86%) 2025-12-04T12:25:14.1375541Z adding: test/test-reports/python-pytest/distributed._shard.sharded_tensor.ops.test_embedding/distributed._shard.sharded_tensor.ops.test_embedding-fd33e5d9c41f35fb.json (deflated 68%) 2025-12-04T12:25:14.1377646Z adding: test/test-reports/python-pytest/distributed._shard.sharded_tensor.test_sharded_tensor_reshard/distributed._shard.sharded_tensor.test_sharded_tensor_reshard-e6bc79067fb0604d.json (deflated 71%) 2025-12-04T12:25:14.1379343Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-2ef4942791579d03.json (deflated 36%) 2025-12-04T12:25:14.1380752Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-d882aa7ed351d2b7.json (deflated 36%) 2025-12-04T12:25:14.1382142Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-e41d47243c13be74.json (deflated 36%) 2025-12-04T12:25:14.1383550Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-2ed2ccb680132309.json (deflated 36%) 2025-12-04T12:25:14.1384994Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-a86d7398eb9ff93b.json (deflated 37%) 2025-12-04T12:25:14.1386397Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-50f191d4627fdfd2.json (deflated 36%) 2025-12-04T12:25:14.1387790Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-8cb70355957e1b4b.json (deflated 36%) 2025-12-04T12:25:14.1389293Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-bbde3500be39702b.json (deflated 36%) 2025-12-04T12:25:14.1390653Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-1805de606cf78685.json (deflated 37%) 2025-12-04T12:25:14.1392008Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-8a898c87fa4f8fd3.json (deflated 37%) 2025-12-04T12:25:14.1393444Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-41764b12ccdf212e.json (deflated 46%) 2025-12-04T12:25:14.1394807Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-aee5aa2ded024d85.json (deflated 46%) 2025-12-04T12:25:14.1396148Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-8800a2e7b955ab16.json (deflated 46%) 2025-12-04T12:25:14.1397483Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-3a092f5472894a7f.json (deflated 46%) 2025-12-04T12:25:14.1398824Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-f628509e7e3f2a1f.json (deflated 46%) 2025-12-04T12:25:14.1400162Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-c1a78b733abc6caa.json (deflated 46%) 2025-12-04T12:25:14.1401465Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0991bf72558fb22b.json (deflated 33%) 2025-12-04T12:25:14.1402725Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-aa6ce215ba96a24c.json (deflated 36%) 2025-12-04T12:25:14.1403977Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-16fe1d620732710b.json (deflated 33%) 2025-12-04T12:25:14.1405213Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3fe1795a5d3e5b88.json (deflated 33%) 2025-12-04T12:25:14.1406470Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6c7276bb9fa9eee2.json (deflated 34%) 2025-12-04T12:25:14.1407716Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cd50578f9742b761.json (deflated 33%) 2025-12-04T12:25:14.1408974Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5e60172a210dc8b6.json (deflated 34%) 2025-12-04T12:25:14.1410209Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-873ae68d43267ac9.json (deflated 33%) 2025-12-04T12:25:14.1411458Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-34c50e4612c9fea4.json (deflated 33%) 2025-12-04T12:25:14.1412713Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d54fb6be7a931b62.json (deflated 33%) 2025-12-04T12:25:14.1413963Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2259b8bd184524fc.json (deflated 34%) 2025-12-04T12:25:14.1415210Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8f01caa16144b040.json (deflated 33%) 2025-12-04T12:25:14.1416542Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-31de274c3cb59c01.json (deflated 33%) 2025-12-04T12:25:14.1418000Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-db19637423ab0dbc.json (deflated 34%) 2025-12-04T12:25:14.1419292Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b23ea90304491b65.json (deflated 34%) 2025-12-04T12:25:14.1420580Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eaee01f734bb6504.json (deflated 33%) 2025-12-04T12:25:14.1422010Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0fa860b184f8ddb6.json (deflated 33%) 2025-12-04T12:25:14.1423307Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-33cbbe588c8f840c.json (deflated 34%) 2025-12-04T12:25:14.1424597Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-de8dc85b62067611.json (deflated 34%) 2025-12-04T12:25:14.1425993Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0f2cd4f378b677f0.json (deflated 34%) 2025-12-04T12:25:14.1427307Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e35b0454119a9f51.json (deflated 34%) 2025-12-04T12:25:14.1428598Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d98cd20152af5d53.json (deflated 34%) 2025-12-04T12:25:14.1429889Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3982ee850d6ce795.json (deflated 33%) 2025-12-04T12:25:14.1431179Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-08455987c8f710af.json (deflated 34%) 2025-12-04T12:25:14.1432554Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e90446a7a06b5b78.json (deflated 34%) 2025-12-04T12:25:14.1433809Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3abd929020861bdc.json (deflated 34%) 2025-12-04T12:25:14.1435066Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d79cb42da7e54a79.json (deflated 33%) 2025-12-04T12:25:14.1436319Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1a14244d1e7f6bb2.json (deflated 35%) 2025-12-04T12:25:14.1437556Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a80b6bac28c5c972.json (deflated 33%) 2025-12-04T12:25:14.1438810Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bf45f3c093461361.json (deflated 34%) 2025-12-04T12:25:14.1440057Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-81160b788c5abcc2.json (deflated 34%) 2025-12-04T12:25:14.1441305Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2242d642afc7f886.json (deflated 33%) 2025-12-04T12:25:14.1442539Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-327f840cbb3f5094.json (deflated 36%) 2025-12-04T12:25:14.1464318Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-724f786ab432a45b.json (deflated 35%) 2025-12-04T12:25:14.1465763Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-aae15a76989ce46a.json (deflated 35%) 2025-12-04T12:25:14.1467057Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ee273f849859fe9.json (deflated 35%) 2025-12-04T12:25:14.1468352Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93baf128de560649.json (deflated 35%) 2025-12-04T12:25:14.1469856Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1f85ec05eddb726d.json (deflated 34%) 2025-12-04T12:25:14.1471118Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c9eb752317a73e18.json (deflated 34%) 2025-12-04T12:25:14.1472358Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cedb520e520b4782.json (deflated 35%) 2025-12-04T12:25:14.1473615Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e69dd1a2e9fba2dc.json (deflated 35%) 2025-12-04T12:25:14.1474857Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-47c9021380160661.json (deflated 36%) 2025-12-04T12:25:14.1476101Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-681adc1d59f04282.json (deflated 35%) 2025-12-04T12:25:14.1477327Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1755a27e81246495.json (deflated 35%) 2025-12-04T12:25:14.1478655Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b2036226275eb311.json (deflated 35%) 2025-12-04T12:25:14.1479948Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3f50e0fff8c24c86.json (deflated 36%) 2025-12-04T12:25:14.1481203Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d908f57090f2acd6.json (deflated 35%) 2025-12-04T12:25:14.1482450Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-ac7a92e764fd2c8b.json (deflated 35%) 2025-12-04T12:25:14.1483694Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2f80e6d84c47c0a7.json (deflated 36%) 2025-12-04T12:25:14.1484945Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2042e0d50243da8a.json (deflated 36%) 2025-12-04T12:25:14.1486199Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bb9adcd8663666ac.json (deflated 35%) 2025-12-04T12:25:14.1487454Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-246370ceca8d8d8b.json (deflated 35%) 2025-12-04T12:25:14.1488695Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f75c8f9699a93e6a.json (deflated 36%) 2025-12-04T12:25:14.1489936Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-830d90348309a50c.json (deflated 35%) 2025-12-04T12:25:14.1491179Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-257d76299fdbf250.json (deflated 35%) 2025-12-04T12:25:14.1492418Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fa0b0b810d894be9.json (deflated 34%) 2025-12-04T12:25:14.1493648Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b713da153aca8219.json (deflated 35%) 2025-12-04T12:25:14.1494895Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-812da336a80f282a.json (deflated 32%) 2025-12-04T12:25:14.1496140Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2be07987a59e5da5.json (deflated 32%) 2025-12-04T12:25:14.1497684Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0d952f420fed2de5.json (deflated 33%) 2025-12-04T12:25:14.1498959Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d29bf39728651f67.json (deflated 33%) 2025-12-04T12:25:14.1500235Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-01e88d26c5e6aa85.json (deflated 33%) 2025-12-04T12:25:14.1501520Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-25efe3194372b4e6.json (deflated 32%) 2025-12-04T12:25:14.1502843Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ccf063a53847c36.json (deflated 33%) 2025-12-04T12:25:14.1504121Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-72be92db0e827d7f.json (deflated 32%) 2025-12-04T12:25:14.1505403Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-84f86de4e3aa962a.json (deflated 33%) 2025-12-04T12:25:14.1506702Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e5c4d09fb827cb7f.json (deflated 32%) 2025-12-04T12:25:14.1507983Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-165d83ae78886ff8.json (deflated 33%) 2025-12-04T12:25:14.1509442Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-76f6fcd9346eff0a.json (deflated 33%) 2025-12-04T12:25:14.1510663Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e84bdf3d05666f91.json (deflated 32%) 2025-12-04T12:25:14.1511927Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a357bf2b1c694c62.json (deflated 33%) 2025-12-04T12:25:14.1513169Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b1b5f73bcb8b828f.json (deflated 33%) 2025-12-04T12:25:14.1514369Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e742397162ed9e3d.json (deflated 33%) 2025-12-04T12:25:14.1515577Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f3a1c05a7b5c0fa8.json (deflated 33%) 2025-12-04T12:25:14.1516789Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fcd37833b58d4bea.json (deflated 33%) 2025-12-04T12:25:14.1518007Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e22bb2e46b3ab636.json (deflated 33%) 2025-12-04T12:25:14.1519221Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d319014b034c95bf.json (deflated 32%) 2025-12-04T12:25:14.1520418Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-393bf6208ab91711.json (deflated 32%) 2025-12-04T12:25:14.1521994Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bb9e40b9771000a0.json (deflated 32%) 2025-12-04T12:25:14.1523287Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d597ca27d8328fc4.json (deflated 33%) 2025-12-04T12:25:14.1524576Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-ff18cf4d50e44f39.json (deflated 33%) 2025-12-04T12:25:14.1525858Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0be906a8969ec101.json (deflated 33%) 2025-12-04T12:25:14.1527149Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-158f1ad05ae2a64b.json (deflated 34%) 2025-12-04T12:25:14.1528434Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-87453a67a1ebaea6.json (deflated 33%) 2025-12-04T12:25:14.1529716Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-94f3fac53aec8990.json (deflated 33%) 2025-12-04T12:25:14.1530983Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93576123b2405b32.json (deflated 33%) 2025-12-04T12:25:14.1532272Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f6666d1683ab3f1d.json (deflated 33%) 2025-12-04T12:25:14.1533663Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-54b039aca43fe5b7.json (deflated 33%) 2025-12-04T12:25:14.1534945Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8eea24e340cd482b.json (deflated 32%) 2025-12-04T12:25:14.1536156Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-abf845b544fb7d20.json (deflated 33%) 2025-12-04T12:25:14.1537674Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f27d8d563aeff333.json (deflated 33%) 2025-12-04T12:25:14.1538964Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b98a8d5dfa728efd.json (deflated 33%) 2025-12-04T12:25:14.1540256Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f9a146a8fac2af4d.json (deflated 33%) 2025-12-04T12:25:14.1541543Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d8bb6ca9e3ae378b.json (deflated 33%) 2025-12-04T12:25:14.1542834Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-604db34ae5cbb6b2.json (deflated 34%) 2025-12-04T12:25:14.1544224Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6d6d34df2e34630b.json (deflated 33%) 2025-12-04T12:25:14.1545562Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-520dfe050df69b4b.json (deflated 33%) 2025-12-04T12:25:14.1546837Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2074cd035f8dc8fc.json (deflated 34%) 2025-12-04T12:25:14.1548117Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-468dffdf4603fb37.json (deflated 33%) 2025-12-04T12:25:14.1549584Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fb8500504162f453.json (deflated 33%) 2025-12-04T12:25:14.1550786Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-56d2f4c749889dbc.json (deflated 33%) 2025-12-04T12:25:14.1552004Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8cef0d6061a45be8.json (deflated 34%) 2025-12-04T12:25:14.1553209Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93d1d438aff7bb95.json (deflated 33%) 2025-12-04T12:25:14.1554407Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5c11159a66fb94a9.json (deflated 33%) 2025-12-04T12:25:14.1555621Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c1ea079cea0d8e56.json (deflated 33%) 2025-12-04T12:25:14.1556997Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f25b64af298ca601.json (deflated 33%) 2025-12-04T12:25:14.1558192Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-87383ac3904bfe89.json (deflated 33%) 2025-12-04T12:25:14.1559398Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d793a1fedd0d4f15.json (deflated 33%) 2025-12-04T12:25:14.1560611Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b67795a049190b1d.json (deflated 33%) 2025-12-04T12:25:14.1561821Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bde1923c97f63381.json (deflated 34%) 2025-12-04T12:25:14.1563015Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2540c713fc68453d.json (deflated 33%) 2025-12-04T12:25:14.1564220Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8d1d058689da62ff.json (deflated 46%) 2025-12-04T12:25:14.1565415Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0c93a8978347968a.json (deflated 33%) 2025-12-04T12:25:14.1566646Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-18641772917d69fc.json (deflated 33%) 2025-12-04T12:25:14.1567839Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6a77c9a2c337df36.json (deflated 34%) 2025-12-04T12:25:14.1569039Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-25efbb19e469ebb7.json (deflated 33%) 2025-12-04T12:25:14.1570248Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eac363af2c24f931.json (deflated 33%) 2025-12-04T12:25:14.1571447Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-33bf8b4540a40636.json (deflated 33%) 2025-12-04T12:25:14.1572746Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-45778cf420dbd19f.json (deflated 35%) 2025-12-04T12:25:14.1573892Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-7dfffc535a3e90f1.json (deflated 35%) 2025-12-04T12:25:14.1575092Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4b2795b0e7efac26.json (deflated 34%) 2025-12-04T12:25:14.1576278Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2b369bec34855654.json (deflated 35%) 2025-12-04T12:25:14.1577725Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d6b15d261538e27e.json (deflated 35%) 2025-12-04T12:25:14.1579012Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ef76d7bc1711751.json (deflated 34%) 2025-12-04T12:25:14.1580292Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0343427a5558824f.json (deflated 31%) 2025-12-04T12:25:14.1581574Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3f70a63e56a4848b.json (deflated 33%) 2025-12-04T12:25:14.1582863Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-821ac567b5ed63bc.json (deflated 33%) 2025-12-04T12:25:14.1584412Z adding: test/test-reports/python-pytest/distributed._shard.sharded_tensor.test_sharded_tensor/distributed._shard.sharded_tensor.test_sharded_tensor-ae33be926ad38292.json (deflated 95%) 2025-12-04T12:25:14.1585976Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4e483f68cef17162.json (deflated 33%) 2025-12-04T12:25:14.1587259Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-05f5b130753b2983.json (deflated 33%) 2025-12-04T12:25:14.1588539Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7e16e53ef8db6995.json (deflated 33%) 2025-12-04T12:25:14.1589872Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1e281dcef1930575.json (deflated 33%) 2025-12-04T12:25:14.1591088Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2b466e71a200bcdc.json (deflated 33%) 2025-12-04T12:25:14.1592297Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-325c8a002e1c83a2.json (deflated 49%) 2025-12-04T12:25:14.1593510Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c0b6a576b76efd0.json (deflated 32%) 2025-12-04T12:25:14.1594913Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e47f2e15272edbaf.json (deflated 33%) 2025-12-04T12:25:14.1596156Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a9e19469eb1a06d4.json (deflated 34%) 2025-12-04T12:25:14.1597498Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-df7444533096a1d8.json (deflated 33%) 2025-12-04T12:25:14.1598747Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d87d87bc823f3dba.json (deflated 33%) 2025-12-04T12:25:14.1599950Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4a50a5ac8cd03017.json (deflated 34%) 2025-12-04T12:25:14.1601164Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0ae50f0e1c874ad8.json (deflated 33%) 2025-12-04T12:25:14.1602383Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7dbf8411ea4b6ce3.json (deflated 33%) 2025-12-04T12:25:14.1603592Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2a6114c53cde50d7.json (deflated 33%) 2025-12-04T12:25:14.1604794Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d109d91d9cd820a7.json (deflated 33%) 2025-12-04T12:25:14.1606105Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7e589af2daee12d3.json (deflated 33%) 2025-12-04T12:25:14.1607339Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ff536a30913e6717.json (deflated 35%) 2025-12-04T12:25:14.1608517Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-16e8bb0ec51136f2.json (deflated 36%) 2025-12-04T12:25:14.1609661Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-688fcf4f5f0deff2.json (deflated 35%) 2025-12-04T12:25:14.1610791Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-c2f4984a060c2ce4.json (deflated 36%) 2025-12-04T12:25:14.1611931Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4874c9e324e6599b.json (deflated 35%) 2025-12-04T12:25:14.1613065Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-81b232fd98a6eda2.json (deflated 35%) 2025-12-04T12:25:14.1614214Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-dbedd4dfa730b471.json (deflated 35%) 2025-12-04T12:25:14.1615353Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e94fe5aed063a3e7.json (deflated 35%) 2025-12-04T12:25:14.1616555Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-191142456fb777f7.json (deflated 35%) 2025-12-04T12:25:14.1617983Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d909bdccb7ddf2c0.json (deflated 35%) 2025-12-04T12:25:14.1619268Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2e3a4388e42e1415.json (deflated 34%) 2025-12-04T12:25:14.1620535Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c5f42a263385a17.json (deflated 36%) 2025-12-04T12:25:14.1621994Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a6537375079d62ca.json (deflated 35%) 2025-12-04T12:25:14.1623286Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-515a3b961a30c93e.json (deflated 35%) 2025-12-04T12:25:14.1624569Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-247b406154c62e2b.json (deflated 35%) 2025-12-04T12:25:14.1625837Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-54fc92777b10ce8b.json (deflated 34%) 2025-12-04T12:25:14.1627134Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-07a5e82fccbcefb0.json (deflated 35%) 2025-12-04T12:25:14.1628420Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-98372eb164ddb8a6.json (deflated 36%) 2025-12-04T12:25:14.1629781Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9a91f2cdfa9f567b.json (deflated 36%) 2025-12-04T12:25:14.1631057Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-578f1554447ed157.json (deflated 35%) 2025-12-04T12:25:14.1632342Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-cba9e46262707896.json (deflated 36%) 2025-12-04T12:25:14.1633643Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-b5cc6836ef1a3879.json (deflated 34%) 2025-12-04T12:25:14.1634788Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1a086feba79f79de.json (deflated 35%) 2025-12-04T12:25:14.1635921Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-fd712f2413b91025.json (deflated 33%) 2025-12-04T12:25:14.1637052Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2e275020a83607d9.json (deflated 45%) 2025-12-04T12:25:14.1638270Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-32cb996256d67719.json (deflated 49%) 2025-12-04T12:25:14.1639451Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-281110f64c593b33.json (deflated 34%) 2025-12-04T12:25:14.1640584Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ab551cc6e4b8fc0e.json (deflated 35%) 2025-12-04T12:25:14.1641732Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-bb4b38110c51be7b.json (deflated 35%) 2025-12-04T12:25:14.1642880Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d76cceb106b5a87a.json (deflated 34%) 2025-12-04T12:25:14.1644034Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f5087c7fb2c85ea4.json (deflated 34%) 2025-12-04T12:25:14.1645180Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5bf92e22e16000ae.json (deflated 35%) 2025-12-04T12:25:14.1646314Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a2df2e6eff7daa02.json (deflated 32%) 2025-12-04T12:25:14.1647456Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-62cf8d48558e6611.json (deflated 48%) 2025-12-04T12:25:14.1648599Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-008b4e727f5be082.json (deflated 31%) 2025-12-04T12:25:14.1649743Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0b38d08cedf93968.json (deflated 32%) 2025-12-04T12:25:14.1650867Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0615767c47cb824b.json (deflated 35%) 2025-12-04T12:25:14.1652009Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3a85b82e41e52e7b.json (deflated 34%) 2025-12-04T12:25:14.1653155Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-670c4eb9ad8ac35a.json (deflated 34%) 2025-12-04T12:25:14.1654295Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1ae993f40739468a.json (deflated 34%) 2025-12-04T12:25:14.1655412Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1379655e313056b3.json (deflated 34%) 2025-12-04T12:25:14.1656612Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-17d32ccc8ec15e49.json (deflated 34%) 2025-12-04T12:25:14.1658055Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c5afe3c6d472874.json (deflated 34%) 2025-12-04T12:25:14.1659353Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-71d8c77dbd2b6cd3.json (deflated 33%) 2025-12-04T12:25:14.1660677Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9e93da4b49ea34dc.json (deflated 33%) 2025-12-04T12:25:14.1661971Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-09fe633d76933c88.json (deflated 34%) 2025-12-04T12:25:14.1663261Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4db84368319deb77.json (deflated 33%) 2025-12-04T12:25:14.1664543Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-867c58ec01067ba4.json (deflated 34%) 2025-12-04T12:25:14.1665818Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f4ea20dbc7c23240.json (deflated 34%) 2025-12-04T12:25:14.1667104Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-197b01c054eb8425.json (deflated 33%) 2025-12-04T12:25:14.1668383Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5f78ef08e5f67618.json (deflated 33%) 2025-12-04T12:25:14.1669754Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5dd09e666c5e73ac.json (deflated 34%) 2025-12-04T12:25:14.1670910Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8d5b24102af3938b.json (deflated 34%) 2025-12-04T12:25:14.1672049Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7ed88178415e82af.json (deflated 33%) 2025-12-04T12:25:14.1673197Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-17ddadec6a584fc8.json (deflated 34%) 2025-12-04T12:25:14.1674432Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-db161ee1d414a014.json (stored 0%) 2025-12-04T12:25:14.1675739Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aee66205f8817bd7.json (stored 0%) 2025-12-04T12:25:14.1677052Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f4fea7b2e6cf3a65.json (stored 0%) 2025-12-04T12:25:14.1678371Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-422b22169e3a08f1.json (stored 0%) 2025-12-04T12:25:14.1679672Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ec15082b412f697.json (stored 0%) 2025-12-04T12:25:14.1680971Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a2eda26248d83b8e.json (stored 0%) 2025-12-04T12:25:14.1682275Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e12df5e946a2399b.json (stored 0%) 2025-12-04T12:25:14.1683585Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4ab25792bd6780ce.json (stored 0%) 2025-12-04T12:25:14.1684893Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee61fca4ae363844.json (stored 0%) 2025-12-04T12:25:14.1686440Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e43b258f943c7149.json (stored 0%) 2025-12-04T12:25:14.1687820Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ed8ce545db3785b0.json (stored 0%) 2025-12-04T12:25:14.1689204Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-51bd71d27c2db4f0.json (stored 0%) 2025-12-04T12:25:14.1690598Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-72f602b330e606cb.json (stored 0%) 2025-12-04T12:25:14.1692016Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-94537227bc12f698.json (stored 0%) 2025-12-04T12:25:14.1693381Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f7368dd24235350f.json (stored 0%) 2025-12-04T12:25:14.1694764Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-12e19ecac0707a9f.json (stored 0%) 2025-12-04T12:25:14.1696406Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-49aeb17bc0069227.json (stored 0%) 2025-12-04T12:25:14.1698033Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-82678a9127d50625.json (stored 0%) 2025-12-04T12:25:14.1699528Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eeb723e5683986dd.json (deflated 37%) 2025-12-04T12:25:14.1701104Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7dd0923a385a5b44.json (deflated 44%) 2025-12-04T12:25:14.1702636Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-875b3394fe6124ff.json (deflated 37%) 2025-12-04T12:25:14.1704139Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a01719010801f0eb.json (deflated 37%) 2025-12-04T12:25:14.1705653Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-abb38b8b64296782.json (deflated 37%) 2025-12-04T12:25:14.1707153Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-35d5d4bfe910714e.json (deflated 37%) 2025-12-04T12:25:14.1708671Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-fcdbe5c8d6246957.json (deflated 44%) 2025-12-04T12:25:14.1710266Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4f2d32d76cd9ea4c.json (deflated 45%) 2025-12-04T12:25:14.1711610Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8d01dd7848e58726.json (deflated 43%) 2025-12-04T12:25:14.1712937Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b37ec36150974cdc.json (deflated 43%) 2025-12-04T12:25:14.1714272Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a5c97ba7476f9699.json (deflated 43%) 2025-12-04T12:25:14.1715806Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9f7bc9881e047dd1.json (deflated 43%) 2025-12-04T12:25:14.1717226Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0d8492641a4c3af3.json (deflated 43%) 2025-12-04T12:25:14.1718644Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a118777d82e8d7e.json (deflated 37%) 2025-12-04T12:25:14.1720056Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6f1779e409eaf9fb.json (deflated 44%) 2025-12-04T12:25:14.1721816Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a2c564c0db133fb.json (deflated 37%) 2025-12-04T12:25:14.1723337Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c4e9ae811cf30c32.json (deflated 44%) 2025-12-04T12:25:14.1724920Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a0ffda73db67d0e.json (deflated 44%) 2025-12-04T12:25:14.1726427Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b10091684b37c862.json (deflated 42%) 2025-12-04T12:25:14.1727944Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-362536b218c78604.json (deflated 37%) 2025-12-04T12:25:14.1729452Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e2a2b6d5dc912ba1.json (deflated 37%) 2025-12-04T12:25:14.1730969Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2bfa612f1908806e.json (deflated 43%) 2025-12-04T12:25:14.1732475Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c241632c1bd2254.json (deflated 37%) 2025-12-04T12:25:14.1734222Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-300d15ebe169a67d.json (deflated 57%) 2025-12-04T12:25:14.1735671Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2664154f3bddb6ff.json (deflated 44%) 2025-12-04T12:25:14.1737345Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b262143f686a88dd.json (deflated 43%) 2025-12-04T12:25:14.1738857Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c004db07f7b0860b.json (deflated 44%) 2025-12-04T12:25:14.1740364Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc18c93bde07fa33.json (deflated 44%) 2025-12-04T12:25:14.1741886Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d33e44b619f43cc1.json (deflated 57%) 2025-12-04T12:25:14.1743400Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c44272ce3d4ac199.json (deflated 38%) 2025-12-04T12:25:14.1744918Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ea07358affb5e144.json (deflated 37%) 2025-12-04T12:25:14.1746426Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c57c7620876639a.json (deflated 43%) 2025-12-04T12:25:14.1747920Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eede0e2726c06cab.json (deflated 37%) 2025-12-04T12:25:14.1749589Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a276c210ef7f6689.json (deflated 43%) 2025-12-04T12:25:14.1750935Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd59825a029f8f8b.json (deflated 37%) 2025-12-04T12:25:14.1752274Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f5a9742e1242440.json (deflated 38%) 2025-12-04T12:25:14.1753600Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6b0873e59b83bf9a.json (deflated 37%) 2025-12-04T12:25:14.1754935Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64bbf1c836e72a15.json (deflated 36%) 2025-12-04T12:25:14.1756272Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f83300f2b97b0a07.json (deflated 37%) 2025-12-04T12:25:14.1757613Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-46e1a3ccabb4ea53.json (deflated 37%) 2025-12-04T12:25:14.1758988Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52cd579e7fe5892c.json (deflated 44%) 2025-12-04T12:25:14.1760318Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cb876d9d148638c4.json (deflated 44%) 2025-12-04T12:25:14.1761644Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-419043608d870248.json (deflated 45%) 2025-12-04T12:25:14.1762978Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-03caaef3ff0396d9.json (deflated 45%) 2025-12-04T12:25:14.1764322Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a49158b49188737a.json (deflated 43%) 2025-12-04T12:25:14.1765649Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9371e4128a3ac8fe.json (deflated 43%) 2025-12-04T12:25:14.1767036Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bf7e7c630fc800f5.json (deflated 43%) 2025-12-04T12:25:14.1768407Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f263367a9b8ff205.json (deflated 44%) 2025-12-04T12:25:14.1769756Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9da5cc1abf82fc88.json (deflated 44%) 2025-12-04T12:25:14.1771107Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-17270d7c5dcce82d.json (deflated 43%) 2025-12-04T12:25:14.1772437Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52a8a0406f3c10fb.json (deflated 37%) 2025-12-04T12:25:14.1773774Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8955835fa53fe405.json (deflated 44%) 2025-12-04T12:25:14.1775114Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41e8000da4470974.json (deflated 37%) 2025-12-04T12:25:14.1776513Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-17b82ffe3c62718d.json (deflated 36%) 2025-12-04T12:25:14.1778148Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-550a077945687423.json (deflated 42%) 2025-12-04T12:25:14.1779650Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-97658b25492d180c.json (deflated 37%) 2025-12-04T12:25:14.1781157Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5ba6b434230b8a31.json (deflated 43%) 2025-12-04T12:25:14.1782670Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ab85cfcce385bb9.json (deflated 37%) 2025-12-04T12:25:14.1784169Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-205c67b3e9ea2006.json (deflated 37%) 2025-12-04T12:25:14.1785678Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a7727ff60499e455.json (deflated 37%) 2025-12-04T12:25:14.1787190Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5545774781103441.json (deflated 37%) 2025-12-04T12:25:14.1788804Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-69b99129eec5d274.json (deflated 37%) 2025-12-04T12:25:14.1790323Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-71229775f4c708c6.json (deflated 45%) 2025-12-04T12:25:14.1791647Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ef94932e8a93743e.json (deflated 43%) 2025-12-04T12:25:14.1792986Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-830e1894dcf5c994.json (deflated 43%) 2025-12-04T12:25:14.1794321Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d6ec9fe8576de151.json (deflated 37%) 2025-12-04T12:25:14.1795665Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ac8ca9bd1994ece.json (deflated 38%) 2025-12-04T12:25:14.1796997Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3403d5bb8935cb4e.json (deflated 37%) 2025-12-04T12:25:14.1798392Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b0c166deb400ad9d.json (deflated 38%) 2025-12-04T12:25:14.1799768Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-60e4e17b51df739f.json (deflated 36%) 2025-12-04T12:25:14.1801108Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22eb7410be2437d9.json (deflated 37%) 2025-12-04T12:25:14.1802451Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9ee70791b9debd6c.json (deflated 45%) 2025-12-04T12:25:14.1803786Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-81abecf194df2c45.json (deflated 45%) 2025-12-04T12:25:14.1805118Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1136154023961765.json (deflated 43%) 2025-12-04T12:25:14.1806458Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cfef205e8493de16.json (deflated 37%) 2025-12-04T12:25:14.1807809Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd599f355b8caaeb.json (deflated 37%) 2025-12-04T12:25:14.1809146Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-62ca7bd8b65dea10.json (deflated 44%) 2025-12-04T12:25:14.1810504Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b3d3e55cfe315fc5.json (deflated 37%) 2025-12-04T12:25:14.1811853Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a45eb631d6c35ef.json (deflated 44%) 2025-12-04T12:25:14.1813202Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aae6fb78854ea6ff.json (deflated 38%) 2025-12-04T12:25:14.1814548Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9eef2c9b45729eeb.json (deflated 47%) 2025-12-04T12:25:14.1815881Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d106ae3bbe7d9e5c.json (deflated 37%) 2025-12-04T12:25:14.1817532Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ff643138d43dd85.json (deflated 56%) 2025-12-04T12:25:14.1819037Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c72d0c28afc7b8b.json (deflated 37%) 2025-12-04T12:25:14.1820586Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8cb6ed13882ace9d.json (deflated 37%) 2025-12-04T12:25:14.1822279Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-51d5ea88c29b6ed7.json (deflated 43%) 2025-12-04T12:25:14.1823795Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0e2af92baadfb43c.json (deflated 37%) 2025-12-04T12:25:14.1825304Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ee64e4888310471.json (deflated 37%) 2025-12-04T12:25:14.1826813Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2124f6a7f1f8a6ad.json (deflated 37%) 2025-12-04T12:25:14.1828323Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a72595ddb271e95.json (deflated 43%) 2025-12-04T12:25:14.1829949Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f5a0fd7e9efb76d5.json (deflated 44%) 2025-12-04T12:25:14.1831508Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f05ec777ac110fb6.json (deflated 38%) 2025-12-04T12:25:14.1833214Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c4dbe227aaf8cd2.json (deflated 43%) 2025-12-04T12:25:14.1834636Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d8d80edc2b8c69e.json (deflated 37%) 2025-12-04T12:25:14.1836038Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-50add8f3174dd7ac.json (deflated 37%) 2025-12-04T12:25:14.1837466Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-851cdc069dcc69f7.json (deflated 37%) 2025-12-04T12:25:14.1839085Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1acd79e907003b41.json (deflated 46%) 2025-12-04T12:25:14.1840543Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a0ff1f71f9283f58.json (deflated 45%) 2025-12-04T12:25:14.1842012Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-65237f33092a4b4f.json (deflated 37%) 2025-12-04T12:25:14.1843481Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5046dc8bfb623fa3.json (deflated 37%) 2025-12-04T12:25:14.1844950Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4878dd0838c676b7.json (deflated 44%) 2025-12-04T12:25:14.1846421Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-66566e960af2b7cd.json (deflated 37%) 2025-12-04T12:25:14.1847895Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9252bf6025e90d42.json (deflated 37%) 2025-12-04T12:25:14.1849356Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5b920f5d1c4972a5.json (deflated 37%) 2025-12-04T12:25:14.1850964Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41378464ce08003d.json (deflated 37%) 2025-12-04T12:25:14.1852314Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee4c603fd47011fa.json (deflated 44%) 2025-12-04T12:25:14.1853660Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9973927e7b530617.json (deflated 45%) 2025-12-04T12:25:14.1855051Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-faddb0db331380df.json (deflated 43%) 2025-12-04T12:25:14.1856461Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-babf9f26b0f01a05.json (deflated 43%) 2025-12-04T12:25:14.1858119Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-682bb4a108ba0cff.json (deflated 43%) 2025-12-04T12:25:14.1859646Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d0185f9ec4d4c49f.json (deflated 43%) 2025-12-04T12:25:14.1861171Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-011699f09fdd352f.json (deflated 43%) 2025-12-04T12:25:14.1862685Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c6b066059948ead.json (deflated 37%) 2025-12-04T12:25:14.1864271Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22fab5f0e190ff66.json (deflated 44%) 2025-12-04T12:25:14.1865822Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-55702aa5023cfcc5.json (deflated 37%) 2025-12-04T12:25:14.1867342Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ccae7814a1c4777f.json (deflated 44%) 2025-12-04T12:25:14.1868952Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5bd848f11487517d.json (deflated 44%) 2025-12-04T12:25:14.1870422Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-27d68b49187eba1f.json (deflated 42%) 2025-12-04T12:25:14.1871786Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cf1bc9411dde71e0.json (deflated 37%) 2025-12-04T12:25:14.1873138Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-445a5d7115d23df5.json (deflated 37%) 2025-12-04T12:25:14.1874495Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-44a168cde9f7a829.json (deflated 43%) 2025-12-04T12:25:14.1875841Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1ba388d3de704172.json (deflated 37%) 2025-12-04T12:25:14.1877197Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd986c0befb813c2.json (deflated 57%) 2025-12-04T12:25:14.1878547Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4610efe5376dfca1.json (deflated 44%) 2025-12-04T12:25:14.1879892Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8b4358fed50c59f1.json (deflated 43%) 2025-12-04T12:25:14.1881232Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-526a02721a1ba5da.json (deflated 44%) 2025-12-04T12:25:14.1882580Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3c0978e54cc6fc10.json (deflated 44%) 2025-12-04T12:25:14.1883930Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bf5a35496e65d5e4.json (deflated 57%) 2025-12-04T12:25:14.1885283Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee9c4c3ca48fe737.json (deflated 37%) 2025-12-04T12:25:14.1886670Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d5ca791415d7ead2.json (deflated 37%) 2025-12-04T12:25:14.1888010Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4b280a14c5b58c7c.json (deflated 43%) 2025-12-04T12:25:14.1889353Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9f1e7a55058f0a18.json (deflated 37%) 2025-12-04T12:25:14.1890702Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c9d23e4c6bbfd6d1.json (deflated 43%) 2025-12-04T12:25:14.1892048Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d04adc5353a474ef.json (deflated 37%) 2025-12-04T12:25:14.1893386Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dd5c3fba431f03e3.json (deflated 38%) 2025-12-04T12:25:14.1894782Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23246ae737e62ded.json (deflated 37%) 2025-12-04T12:25:14.1896156Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8aa7ae0f58f2813b.json (deflated 36%) 2025-12-04T12:25:14.1897850Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cd7e251b7cd67b87.json (deflated 37%) 2025-12-04T12:25:14.1899370Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3ffef4b2a54e0ec6.json (deflated 37%) 2025-12-04T12:25:14.1900888Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f47719c8fab0f3fd.json (deflated 44%) 2025-12-04T12:25:14.1902425Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7f97df23e3af62b7.json (deflated 44%) 2025-12-04T12:25:14.1903949Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7d9b569377c5e6b5.json (deflated 45%) 2025-12-04T12:25:14.1905466Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e79d7fc843c87404.json (deflated 45%) 2025-12-04T12:25:14.1906973Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0b4908c887012bf3.json (deflated 43%) 2025-12-04T12:25:14.1908502Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-15d9380e1c9a62c7.json (deflated 43%) 2025-12-04T12:25:14.1910114Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-89d48b8548171ec2.json (deflated 43%) 2025-12-04T12:25:14.1911475Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e87d273ae3e5c7f4.json (deflated 44%) 2025-12-04T12:25:14.1912839Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5becb9fcc2b2a740.json (deflated 44%) 2025-12-04T12:25:14.1914176Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e50500c3a0076f9a.json (deflated 43%) 2025-12-04T12:25:14.1915527Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c28f45efdfac39c4.json (deflated 37%) 2025-12-04T12:25:14.1916885Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d9fcea5b98362b6a.json (deflated 44%) 2025-12-04T12:25:14.1918269Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23763de39322c899.json (deflated 37%) 2025-12-04T12:25:14.1919613Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f7a5837d4cf564eb.json (deflated 36%) 2025-12-04T12:25:14.1921101Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f6098aefa2030078.json (deflated 42%) 2025-12-04T12:25:14.1922802Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d3b389690949ffc.json (deflated 37%) 2025-12-04T12:25:14.1924327Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-00c0b12dc56300ed.json (deflated 43%) 2025-12-04T12:25:14.1925854Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-875462dd555a5412.json (deflated 37%) 2025-12-04T12:25:14.1927462Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5da26e78fc052180.json (deflated 37%) 2025-12-04T12:25:14.1929024Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-705b7a3606470644.json (deflated 37%) 2025-12-04T12:25:14.1930537Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3996750239d4977f.json (deflated 37%) 2025-12-04T12:25:14.1932054Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b1bfbeb9b34c8574.json (deflated 37%) 2025-12-04T12:25:14.1933670Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6c5cc720d34bebc6.json (deflated 45%) 2025-12-04T12:25:14.1935009Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b5eb76bc9735e309.json (deflated 43%) 2025-12-04T12:25:14.1936413Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a28d2b8c4bb8b97.json (deflated 43%) 2025-12-04T12:25:14.1938064Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2fa0ff1a8410ed4.json (deflated 37%) 2025-12-04T12:25:14.1939571Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-42750e8459e7d15b.json (deflated 39%) 2025-12-04T12:25:14.1941080Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d44ddde7846d301e.json (deflated 37%) 2025-12-04T12:25:14.1942602Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d84034c24f131de9.json (deflated 38%) 2025-12-04T12:25:14.1944127Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b21382e4a0d075d7.json (deflated 36%) 2025-12-04T12:25:14.1945650Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f01856e9a2028bff.json (deflated 37%) 2025-12-04T12:25:14.1947156Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d271f82508cdd35e.json (deflated 45%) 2025-12-04T12:25:14.1948766Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-602ab3c67d585e00.json (deflated 45%) 2025-12-04T12:25:14.1950250Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6c4b4f500cbe46b2.json (deflated 43%) 2025-12-04T12:25:14.1951653Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-060bfe393d18a7b7.json (deflated 37%) 2025-12-04T12:25:14.1953006Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-08a6cb454dfb3288.json (deflated 37%) 2025-12-04T12:25:14.1954345Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-14f8591ab0b18d47.json (deflated 44%) 2025-12-04T12:25:14.1955697Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-faf65bc8adad7023.json (deflated 37%) 2025-12-04T12:25:14.1957051Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7ab921a38daba1bb.json (deflated 44%) 2025-12-04T12:25:14.1958399Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-205a17c445d16b08.json (deflated 38%) 2025-12-04T12:25:14.1959781Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-14314f5e6064defd.json (deflated 47%) 2025-12-04T12:25:14.1961154Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9a98077fc0a28449.json (deflated 37%) 2025-12-04T12:25:14.1962511Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3e2de3e4d8afa5ff.json (deflated 56%) 2025-12-04T12:25:14.1963860Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-512586046bd1af6f.json (deflated 37%) 2025-12-04T12:25:14.1965208Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1fa69b7512f74eae.json (deflated 37%) 2025-12-04T12:25:14.1966555Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-70138f82b180a3f5.json (deflated 43%) 2025-12-04T12:25:14.1967894Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b7ed61d0627f9533.json (deflated 37%) 2025-12-04T12:25:14.1969235Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-493e10e45797f8fa.json (deflated 37%) 2025-12-04T12:25:14.1970582Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-87c65811f60e5e0f.json (deflated 37%) 2025-12-04T12:25:14.1971931Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-635f35dfbbc33c85.json (deflated 43%) 2025-12-04T12:25:14.1973266Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-355930f4da4ab18f.json (deflated 44%) 2025-12-04T12:25:14.1974613Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f6333fa7d0fe5c91.json (deflated 38%) 2025-12-04T12:25:14.1975965Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3076e5b00c0eef07.json (deflated 43%) 2025-12-04T12:25:14.1977636Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9141798051401a79.json (deflated 37%) 2025-12-04T12:25:14.1979151Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d96c5808f2f4d423.json (deflated 37%) 2025-12-04T12:25:14.1980660Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-59eca95b80bf15e4.json (deflated 37%) 2025-12-04T12:25:14.1982174Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7eeb7f329dcb1625.json (deflated 46%) 2025-12-04T12:25:14.1983729Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c438893677b09839.json (deflated 45%) 2025-12-04T12:25:14.1985247Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d707ddf229008c6a.json (deflated 37%) 2025-12-04T12:25:14.1986755Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c31ce4d4db4e93a.json (deflated 37%) 2025-12-04T12:25:14.1988262Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-714862760bd05954.json (deflated 38%) 2025-12-04T12:25:14.1989779Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-16429bc307938d70.json (deflated 37%) 2025-12-04T12:25:14.1991115Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-92f77f3d8cd66053.json (deflated 37%) 2025-12-04T12:25:14.1992499Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-deed4e34c84ee498.json (deflated 45%) 2025-12-04T12:25:14.1993846Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-425b9693fd331423.json (deflated 36%) 2025-12-04T12:25:14.1995176Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9149f9baa8d84141.json (deflated 43%) 2025-12-04T12:25:14.1996505Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78d5cc488c73d225.json (deflated 43%) 2025-12-04T12:25:14.1997840Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-017a63f22f7a2e26.json (deflated 36%) 2025-12-04T12:25:14.1999161Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3e6391f21f8fa7c0.json (deflated 36%) 2025-12-04T12:25:14.2000502Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9e8b675076ef3915.json (deflated 37%) 2025-12-04T12:25:14.2001843Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b8d64d4666fb6c9d.json (deflated 37%) 2025-12-04T12:25:14.2003188Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0dee982caae0bf52.json (deflated 36%) 2025-12-04T12:25:14.2004508Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0df7122c519ced4f.json (deflated 37%) 2025-12-04T12:25:14.2005847Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2827e400085e914f.json (deflated 44%) 2025-12-04T12:25:14.2007179Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7d39e0b557433741.json (deflated 45%) 2025-12-04T12:25:14.2008513Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e6c5067f69c5dc42.json (deflated 44%) 2025-12-04T12:25:14.2010067Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d40c5c296523fcf4.json (deflated 44%) 2025-12-04T12:25:14.2011467Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e19c088745912810.json (deflated 37%) 2025-12-04T12:25:14.2012872Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-21b633b88362af20.json (deflated 37%) 2025-12-04T12:25:14.2014316Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f1d69885e8023d73.json (deflated 37%) 2025-12-04T12:25:14.2015726Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76455ff9fe96f12c.json (deflated 37%) 2025-12-04T12:25:14.2017388Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9224f6b7ff8b973c.json (deflated 37%) 2025-12-04T12:25:14.2018891Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64019cd840b5ae37.json (deflated 44%) 2025-12-04T12:25:14.2020392Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c52c688cda6423d1.json (deflated 44%) 2025-12-04T12:25:14.2022062Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-56aae62a7e88ec0a.json (deflated 37%) 2025-12-04T12:25:14.2023674Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-126517b1e280f193.json (deflated 37%) 2025-12-04T12:25:14.2025203Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2d346d213506e58a.json (deflated 37%) 2025-12-04T12:25:14.2026707Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-093f4d1e23acb10f.json (deflated 57%) 2025-12-04T12:25:14.2028208Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-810e1605bd5350e8.json (deflated 38%) 2025-12-04T12:25:14.2029701Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-43db9cfa18063736.json (deflated 37%) 2025-12-04T12:25:14.2031196Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3d256d1cc46d8d8d.json (deflated 37%) 2025-12-04T12:25:14.2032706Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a0174602e3f0dc49.json (deflated 42%) 2025-12-04T12:25:14.2034161Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d15167d0a9773e6.json (deflated 37%) 2025-12-04T12:25:14.2035491Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2a355bd7e8aa2084.json (deflated 37%) 2025-12-04T12:25:14.2036818Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a694586bb28814d4.json (deflated 38%) 2025-12-04T12:25:14.2038133Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-91f11f0cc30a0889.json (deflated 37%) 2025-12-04T12:25:14.2039473Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc882534d0c7ac9e.json (deflated 36%) 2025-12-04T12:25:14.2040995Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3576431fa0a79154.json (deflated 37%) 2025-12-04T12:25:14.2042485Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-85e1893ad67dccf3.json (deflated 36%) 2025-12-04T12:25:14.2043946Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-148510b891c749c6.json (deflated 37%) 2025-12-04T12:25:14.2045371Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e6549972a7efaf11.json (deflated 37%) 2025-12-04T12:25:14.2047026Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ea6ea860d10e295.json (deflated 37%) 2025-12-04T12:25:14.2048486Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-83ab4f7124e50996.json (deflated 37%) 2025-12-04T12:25:14.2049930Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a6c1a924e8712f89.json (deflated 44%) 2025-12-04T12:25:14.2051393Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0bec6d0d6dd273b2.json (deflated 37%) 2025-12-04T12:25:14.2052952Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ce5c2131a079a118.json (deflated 37%) 2025-12-04T12:25:14.2054366Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9aa0d7a04a1b05f2.json (deflated 44%) 2025-12-04T12:25:14.2055827Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-85e0e890e418ce3a.json (deflated 45%) 2025-12-04T12:25:14.2058209Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4cffe073269e4f0a.json (deflated 43%) 2025-12-04T12:25:14.2059721Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-fb78beccd38dd26e.json (deflated 42%) 2025-12-04T12:25:14.2061224Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c24763a200436369.json (deflated 37%) 2025-12-04T12:25:14.2062717Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-95f84fd6ea33eee0.json (deflated 48%) 2025-12-04T12:25:14.2064217Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-88fe6d3cec93de32.json (deflated 36%) 2025-12-04T12:25:14.2065725Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0260bf01f397061e.json (deflated 37%) 2025-12-04T12:25:14.2067227Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc07ca8676eed412.json (deflated 37%) 2025-12-04T12:25:14.2068836Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c73c9ddbbd799146.json (deflated 43%) 2025-12-04T12:25:14.2070289Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d73e4a124891508d.json (deflated 37%) 2025-12-04T12:25:14.2071615Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e44eef95a4d81dc3.json (deflated 37%) 2025-12-04T12:25:14.2072957Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78d0f5373874b1c4.json (deflated 37%) 2025-12-04T12:25:14.2074285Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c88483e90b04648.json (deflated 37%) 2025-12-04T12:25:14.2075619Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ccf199cbc8b611ab.json (deflated 37%) 2025-12-04T12:25:14.2076951Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6a4daccc9da30cdb.json (deflated 37%) 2025-12-04T12:25:14.2078300Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d983aecef8c58dfb.json (deflated 37%) 2025-12-04T12:25:14.2079643Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-746325984b31e17e.json (deflated 44%) 2025-12-04T12:25:14.2081014Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0b8591cc84ef2a6a.json (deflated 43%) 2025-12-04T12:25:14.2082338Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c4d97d092b2123a2.json (deflated 38%) 2025-12-04T12:25:14.2083666Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1574030634816010.json (deflated 37%) 2025-12-04T12:25:14.2084991Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5fa3a6eb60f4eca4.json (deflated 38%) 2025-12-04T12:25:14.2086327Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e754e92f5037c52.json (deflated 36%) 2025-12-04T12:25:14.2087658Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-020049def8c5b0a9.json (deflated 43%) 2025-12-04T12:25:14.2089049Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d4dd04eda8983093.json (deflated 36%) 2025-12-04T12:25:14.2090405Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a612b5b9d29cdf4.json (deflated 37%) 2025-12-04T12:25:14.2091740Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f0f750f594e5734b.json (deflated 43%) 2025-12-04T12:25:14.2093079Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7cb1e30e8a2e57ea.json (deflated 43%) 2025-12-04T12:25:14.2094403Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc8052641a24d5dc.json (deflated 44%) 2025-12-04T12:25:14.2095751Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d8cbbb1187ec0f64.json (deflated 37%) 2025-12-04T12:25:14.2097376Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f83af7e95786df72.json (deflated 37%) 2025-12-04T12:25:14.2098882Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a731f1e0a2629b95.json (deflated 44%) 2025-12-04T12:25:14.2100388Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3ae47b09c2c50f23.json (deflated 42%) 2025-12-04T12:25:14.2101896Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ec880e83b34c8e36.json (deflated 47%) 2025-12-04T12:25:14.2103403Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c3833fdae73dbf3c.json (deflated 48%) 2025-12-04T12:25:14.2104928Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-86aa7d82374c9e5b.json (deflated 56%) 2025-12-04T12:25:14.2106439Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a10e426b5fcbde30.json (deflated 37%) 2025-12-04T12:25:14.2107936Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ff35c7e5488dd9ac.json (deflated 37%) 2025-12-04T12:25:14.2109492Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-924d345c27601ea8.json (deflated 44%) 2025-12-04T12:25:14.2110825Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1681683ab3d327ac.json (deflated 37%) 2025-12-04T12:25:14.2112201Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22e9fd6e5aba0f0d.json (deflated 37%) 2025-12-04T12:25:14.2113535Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d9dffcfba1bc1e60.json (deflated 37%) 2025-12-04T12:25:14.2114878Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1b652ce23cebda63.json (deflated 37%) 2025-12-04T12:25:14.2116221Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b5b9a6fa991ecf1c.json (deflated 44%) 2025-12-04T12:25:14.2117553Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f3a9e9304d25446.json (deflated 45%) 2025-12-04T12:25:14.2118896Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0390eeced956f562.json (deflated 37%) 2025-12-04T12:25:14.2120273Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-439532956daa54d1.json (deflated 43%) 2025-12-04T12:25:14.2122015Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0f977aa3cd3cecaf.json (deflated 42%) 2025-12-04T12:25:14.2123514Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-24127363c11860de.json (deflated 42%) 2025-12-04T12:25:14.2125011Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0cd422e8a222e606.json (deflated 37%) 2025-12-04T12:25:14.2126503Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-27b9de38969ee6f6.json (deflated 37%) 2025-12-04T12:25:14.2128015Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-62abfea4d6932c1e.json (deflated 37%) 2025-12-04T12:25:14.2129534Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d86e179dbef96adf.json (deflated 37%) 2025-12-04T12:25:14.2131060Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a6abc3b994eecaab.json (deflated 38%) 2025-12-04T12:25:14.2132572Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f8fe4b288348a5e8.json (deflated 37%) 2025-12-04T12:25:14.2134146Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e1865fe4cd352327.json (deflated 37%) 2025-12-04T12:25:14.2135574Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2d135dba3284d9dd.json (deflated 45%) 2025-12-04T12:25:14.2137239Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ce519dd6997621a.json (deflated 37%) 2025-12-04T12:25:14.2138761Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d25b88aa16186c5.json (deflated 43%) 2025-12-04T12:25:14.2140261Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2b545a8cfb56682b.json (deflated 43%) 2025-12-04T12:25:14.2141765Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-96320154d0a3f580.json (deflated 36%) 2025-12-04T12:25:14.2143281Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d58d0eb09203fc2c.json (deflated 36%) 2025-12-04T12:25:14.2144798Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76e7132ba7ac5de0.json (deflated 37%) 2025-12-04T12:25:14.2146390Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a537f0ef8ed460d9.json (deflated 36%) 2025-12-04T12:25:14.2147893Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3c40fad651035635.json (deflated 36%) 2025-12-04T12:25:14.2149685Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-68c5b031d9a5ae9e.json (deflated 36%) 2025-12-04T12:25:14.2151029Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-712b0b28be8414a0.json (deflated 44%) 2025-12-04T12:25:14.2152367Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7eca96992921c511.json (deflated 45%) 2025-12-04T12:25:14.2153697Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7834531011d91518.json (deflated 44%) 2025-12-04T12:25:14.2155142Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-68f03a926c8d2bd9.json (deflated 44%) 2025-12-04T12:25:14.2156492Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e49faae68d1ac0d9.json (deflated 37%) 2025-12-04T12:25:14.2157836Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc4d026c52898da8.json (deflated 37%) 2025-12-04T12:25:14.2159170Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-03eaa4726076d233.json (deflated 37%) 2025-12-04T12:25:14.2160505Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d471afa2e27428d.json (deflated 37%) 2025-12-04T12:25:14.2161855Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-065a466bb3b41d27.json (deflated 37%) 2025-12-04T12:25:14.2163200Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f328e482896672aa.json (deflated 44%) 2025-12-04T12:25:14.2164550Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee7ee7e277bba08f.json (deflated 44%) 2025-12-04T12:25:14.2165890Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e55ae93852ba5a41.json (deflated 37%) 2025-12-04T12:25:14.2167238Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6750ff7d9a08403d.json (deflated 37%) 2025-12-04T12:25:14.2168577Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d85fe03caf11b880.json (deflated 37%) 2025-12-04T12:25:14.2169924Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f90e1eb29ec7a7eb.json (deflated 57%) 2025-12-04T12:25:14.2171272Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c515ad73db9ec0f.json (deflated 38%) 2025-12-04T12:25:14.2172604Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-be5d3342961d1397.json (deflated 37%) 2025-12-04T12:25:14.2173942Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-81a8ca35b73b2608.json (deflated 37%) 2025-12-04T12:25:14.2175286Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6eb3b25e1011068f.json (deflated 42%) 2025-12-04T12:25:14.2176900Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-16ab3c0f531a2710.json (deflated 37%) 2025-12-04T12:25:14.2178414Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e912af285a88a53.json (deflated 37%) 2025-12-04T12:25:14.2179932Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-043dda7312ce02a9.json (deflated 38%) 2025-12-04T12:25:14.2181441Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3cf2335721c75edb.json (deflated 37%) 2025-12-04T12:25:14.2182965Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ed68ee99b507df29.json (deflated 36%) 2025-12-04T12:25:14.2184490Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-afe3aa9ea643db5b.json (deflated 37%) 2025-12-04T12:25:14.2186069Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-706ef1f553cb8cca.json (deflated 37%) 2025-12-04T12:25:14.2187617Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a98124b8f8d7b3ef.json (deflated 37%) 2025-12-04T12:25:14.2189225Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee37bb64a8e84ec5.json (deflated 37%) 2025-12-04T12:25:14.2190570Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e2af230e2fec6d35.json (deflated 37%) 2025-12-04T12:25:14.2191900Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3008545966a2ad5b.json (deflated 37%) 2025-12-04T12:25:14.2193246Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-53870facd803211b.json (deflated 44%) 2025-12-04T12:25:14.2194590Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4eca7697caf90c2a.json (deflated 37%) 2025-12-04T12:25:14.2195936Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c4554d604268fb5.json (deflated 37%) 2025-12-04T12:25:14.2197685Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c6b52be0b4531e90.json (deflated 44%) 2025-12-04T12:25:14.2199143Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c63a3f0987273dba.json (deflated 45%) 2025-12-04T12:25:14.2200648Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b58af3771e34dd96.json (deflated 43%) 2025-12-04T12:25:14.2202136Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-587b09149e6cc83f.json (deflated 42%) 2025-12-04T12:25:14.2203612Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e3786dc33e6abd50.json (deflated 37%) 2025-12-04T12:25:14.2205069Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dfce7e92d72e48a2.json (deflated 48%) 2025-12-04T12:25:14.2206544Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-627617d506ff1d2f.json (deflated 36%) 2025-12-04T12:25:14.2208015Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64530dfd24199eb7.json (deflated 37%) 2025-12-04T12:25:14.2209518Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ddc33c5ddc10dde.json (deflated 37%) 2025-12-04T12:25:14.2210990Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d0632db0896072cf.json (deflated 43%) 2025-12-04T12:25:14.2212450Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-edeb0bbc0394ec67.json (deflated 37%) 2025-12-04T12:25:14.2213919Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e515d47fe2e6fb9c.json (deflated 37%) 2025-12-04T12:25:14.2215388Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c4f0278f004bb5c.json (deflated 37%) 2025-12-04T12:25:14.2217117Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c0d3bae257da8444.json (deflated 37%) 2025-12-04T12:25:14.2218697Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7025af433f00efbb.json (deflated 37%) 2025-12-04T12:25:14.2220257Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-49fd198402d5c655.json (deflated 37%) 2025-12-04T12:25:14.2221968Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5277c0b0a803851c.json (deflated 37%) 2025-12-04T12:25:14.2223494Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3d4c61b2ce73c677.json (deflated 44%) 2025-12-04T12:25:14.2225022Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cb0710cc3c031aa2.json (deflated 43%) 2025-12-04T12:25:14.2226549Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e4cf4d2497acecc4.json (deflated 38%) 2025-12-04T12:25:14.2228087Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b0b71a9d976366a8.json (deflated 37%) 2025-12-04T12:25:14.2229611Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8c2b944477a517c5.json (deflated 38%) 2025-12-04T12:25:14.2231130Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c7a620380978373.json (deflated 36%) 2025-12-04T12:25:14.2232741Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8aaa461eddd2a0f5.json (deflated 43%) 2025-12-04T12:25:14.2234746Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d5c5af8107d86770.json (deflated 36%) 2025-12-04T12:25:14.2236190Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-629d0d3ddf4c3e06.json (deflated 37%) 2025-12-04T12:25:14.2237637Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7350065f0535f01a.json (deflated 43%) 2025-12-04T12:25:14.2239073Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-877f842d3f2815af.json (deflated 43%) 2025-12-04T12:25:14.2240500Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c391387e4c62daf7.json (deflated 44%) 2025-12-04T12:25:14.2241936Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cea6ac435fa81670.json (deflated 37%) 2025-12-04T12:25:14.2243466Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-69f0ceb782ba322d.json (deflated 37%) 2025-12-04T12:25:14.2244913Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-354a8796ee4ffd32.json (deflated 44%) 2025-12-04T12:25:14.2246335Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52a60b9c4e3ec8c5.json (deflated 42%) 2025-12-04T12:25:14.2247765Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-576d152cd04ca1c5.json (deflated 47%) 2025-12-04T12:25:14.2249193Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5733f17598591d18.json (deflated 48%) 2025-12-04T12:25:14.2250630Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8d06b92a9ae7d27c.json (deflated 56%) 2025-12-04T12:25:14.2252132Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ebef8e69977ebea2.json (deflated 37%) 2025-12-04T12:25:14.2253598Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ea6c158c65373811.json (deflated 37%) 2025-12-04T12:25:14.2255028Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2ff679811871b4a.json (deflated 44%) 2025-12-04T12:25:14.2256533Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc9e37194800f0d1.json (deflated 37%) 2025-12-04T12:25:14.2258206Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5145615a66bd578b.json (deflated 37%) 2025-12-04T12:25:14.2259720Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-33b7f705a30ded9f.json (deflated 37%) 2025-12-04T12:25:14.2261250Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca496a8780de69f3.json (deflated 37%) 2025-12-04T12:25:14.2262777Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8bec3baffba656ff.json (deflated 44%) 2025-12-04T12:25:14.2264301Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c836ef383c971ad8.json (deflated 45%) 2025-12-04T12:25:14.2265824Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-deb32df1c36c795c.json (deflated 37%) 2025-12-04T12:25:14.2267343Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6dabff71918e7b99.json (deflated 43%) 2025-12-04T12:25:14.2268872Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca39e437f793eab2.json (deflated 42%) 2025-12-04T12:25:14.2270353Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d93f79d5e733c01.json (deflated 42%) 2025-12-04T12:25:14.2271704Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2079ea64f821f40e.json (deflated 37%) 2025-12-04T12:25:14.2273048Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eb15a6e33c260556.json (deflated 37%) 2025-12-04T12:25:14.2274402Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ae1eb5639088ccd8.json (deflated 37%) 2025-12-04T12:25:14.2295538Z ##[group]Run # Remove any previous test reports if they exist 2025-12-04T12:25:14.2296095Z # Remove any previous test reports if they exist 2025-12-04T12:25:14.2296614Z rm -f test-reports-*.zip 2025-12-04T12:25:14.2297315Z zip -r "test-reports-${FILE_SUFFIX}.zip" test/test-reports -i '*.xml' -i '*.csv' 2025-12-04T12:25:14.2304114Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:25:14.2304557Z env: 2025-12-04T12:25:14.2304812Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:14.2305108Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:14.2305473Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:14.2306121Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:14.2306919Z FILE_SUFFIX: test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904 2025-12-04T12:25:14.2307478Z ##[endgroup] 2025-12-04T12:25:14.2475667Z adding: test/test-reports/python-pytest/distributed.test_c10d_functional_native/distributed.test_c10d_functional_native-369cc3de9e188dd1.xml (deflated 89%) 2025-12-04T12:25:14.2477377Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-39c8c10a0ef1a34e.xml (deflated 77%) 2025-12-04T12:25:14.2478848Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-bb36a88bac557029.xml (deflated 77%) 2025-12-04T12:25:14.2480266Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-9b6f6e417d9b4600.xml (deflated 77%) 2025-12-04T12:25:14.2481674Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_overlap/distributed.fsdp.test_fsdp_overlap-83c25fe932c36613.xml (deflated 28%) 2025-12-04T12:25:14.2483092Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-e1278d34de852f2a.xml (deflated 77%) 2025-12-04T12:25:14.2484512Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-efcb608498b7750d.xml (deflated 77%) 2025-12-04T12:25:14.2485944Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-9a300aee582fd0b6.xml (deflated 77%) 2025-12-04T12:25:14.2487373Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-433868368b6a29b3.xml (deflated 77%) 2025-12-04T12:25:14.2488797Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-cb48c540b8fb2acf.xml (deflated 86%) 2025-12-04T12:25:14.2490219Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-f306b72badd85355.xml (deflated 77%) 2025-12-04T12:25:14.2491646Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_pure_fp16/distributed.fsdp.test_fsdp_pure_fp16-456a3faf0e1ca4c4.xml (deflated 28%) 2025-12-04T12:25:14.2493124Z adding: test/test-reports/python-pytest/distributed.tensor.debug.test_debug_mode/distributed.tensor.debug.test_debug_mode-21dd2989918f2f32.xml (deflated 82%) 2025-12-04T12:25:14.2494624Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-93c7f0a0a61745d5.xml (deflated 77%) 2025-12-04T12:25:14.2496084Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-50fd36707db41f77.xml (deflated 77%) 2025-12-04T12:25:14.2497862Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-434f2a168fab2502.xml (deflated 77%) 2025-12-04T12:25:14.2499341Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-810575b51f00acc3.xml (deflated 77%) 2025-12-04T12:25:14.2500835Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-acd65444fa26961a.xml (deflated 77%) 2025-12-04T12:25:14.2502428Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d7f6d912312cc834.xml (deflated 77%) 2025-12-04T12:25:14.2503924Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d3fa58c4cf34965f.xml (deflated 78%) 2025-12-04T12:25:14.2505422Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-d5b8ecd9108f02ac.xml (deflated 86%) 2025-12-04T12:25:14.2506903Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-578e4c4077b7a803.xml (deflated 78%) 2025-12-04T12:25:14.2508395Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14d4a314808f55fe.xml (deflated 78%) 2025-12-04T12:25:14.2509960Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-72b90a4f7545df10.xml (deflated 78%) 2025-12-04T12:25:14.2511478Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cc094df1219cfd82.xml (deflated 90%) 2025-12-04T12:25:14.2512944Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-94627d53ab92538d.xml (deflated 78%) 2025-12-04T12:25:14.2514394Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f49c40cee39994b2.xml (deflated 78%) 2025-12-04T12:25:14.2515847Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-a8869f6ed51873ac.xml (deflated 78%) 2025-12-04T12:25:14.2517300Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-90a4ba7c1fd04d10.xml (deflated 78%) 2025-12-04T12:25:14.2518754Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ccaa5b3b6bf09af7.xml (deflated 78%) 2025-12-04T12:25:14.2520207Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-ca39f8152ef39349.xml (deflated 90%) 2025-12-04T12:25:14.2522010Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-7178045a44a28781.xml (deflated 77%) 2025-12-04T12:25:14.2523501Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-cdb7b80b8b392fad.xml (deflated 77%) 2025-12-04T12:25:14.2524995Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-9595731043617943.xml (deflated 86%) 2025-12-04T12:25:14.2526475Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-f8bd87b046fcc0d3.xml (deflated 77%) 2025-12-04T12:25:14.2527976Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-68dc7893385d1617.xml (deflated 77%) 2025-12-04T12:25:14.2529470Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-14f8a536ecccf07e.xml (deflated 77%) 2025-12-04T12:25:14.2530966Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_exec_order/distributed.fsdp.test_fsdp_exec_order-77e61ff77a3b19cd.xml (deflated 28%) 2025-12-04T12:25:14.2532546Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a78dec0d79621f36.xml (deflated 78%) 2025-12-04T12:25:14.2534281Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9a14ac4718e66e44.xml (deflated 78%) 2025-12-04T12:25:14.2535985Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7d115d367e840460.xml (deflated 78%) 2025-12-04T12:25:14.2537866Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-724e16d7d24ec18b.xml (deflated 78%) 2025-12-04T12:25:14.2539516Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-1c81c8f34feb9c16.xml (deflated 78%) 2025-12-04T12:25:14.2541170Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-a326f09bb7c5e616.xml (deflated 78%) 2025-12-04T12:25:14.2542810Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7096ae518bc839e.xml (deflated 78%) 2025-12-04T12:25:14.2544456Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-dbe06a751e4355d9.xml (deflated 78%) 2025-12-04T12:25:14.2546196Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-d7f21dedd43754e1.xml (deflated 78%) 2025-12-04T12:25:14.2547898Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-7dbc99509eb0f4ce.xml (deflated 78%) 2025-12-04T12:25:14.2549639Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-5b4af92028672eb6.xml (deflated 78%) 2025-12-04T12:25:14.2551223Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c67b11ef8bde4252.xml (deflated 78%) 2025-12-04T12:25:14.2552825Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c057f5798619892b.xml (deflated 78%) 2025-12-04T12:25:14.2554430Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-aae1a2ba6806c0ef.xml (deflated 78%) 2025-12-04T12:25:14.2556039Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c34ce2d8050066e8.xml (deflated 78%) 2025-12-04T12:25:14.2557630Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-fde5b3ce12e5a98a.xml (deflated 86%) 2025-12-04T12:25:14.2559226Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-b1cbedcab1229122.xml (deflated 78%) 2025-12-04T12:25:14.2560833Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-6d24496891daae4f.xml (deflated 78%) 2025-12-04T12:25:14.2562445Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e815db3b6b0b67f1.xml (deflated 86%) 2025-12-04T12:25:14.2564050Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-788cdb9001b436df.xml (deflated 77%) 2025-12-04T12:25:14.2565634Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-9601a812ff315158.xml (deflated 77%) 2025-12-04T12:25:14.2567225Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-c4b6ce2b260b8d4b.xml (deflated 77%) 2025-12-04T12:25:14.2568831Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-490a12d48ec816b9.xml (deflated 77%) 2025-12-04T12:25:14.2570470Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-e2f9fc6fa3a79028.xml (deflated 77%) 2025-12-04T12:25:14.2572083Z adding: test/test-reports/python-pytest/distributed.fsdp.test_hsdp_dtensor_state_dict/distributed.fsdp.test_hsdp_dtensor_state_dict-384ab9a5685ff7be.xml (deflated 28%) 2025-12-04T12:25:14.2573679Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a06a4188d644524d.xml (deflated 86%) 2025-12-04T12:25:14.2575269Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-03186403898f3bbb.xml (deflated 86%) 2025-12-04T12:25:14.2576902Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-a3dc994784795bc1.xml (deflated 77%) 2025-12-04T12:25:14.2578705Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-b1d6139c1033a518.xml (deflated 77%) 2025-12-04T12:25:14.2580321Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-ebdc3db326996caa.xml (deflated 77%) 2025-12-04T12:25:14.2581877Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-c42bc725a7562377.xml (deflated 77%) 2025-12-04T12:25:14.2583433Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-4818210284e31d5e.xml (deflated 77%) 2025-12-04T12:25:14.2584993Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-1b5186457c75b3fb.xml (deflated 86%) 2025-12-04T12:25:14.2586553Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-74e02afb5846363a.xml (deflated 77%) 2025-12-04T12:25:14.2588106Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-39202840e4782b07.xml (deflated 90%) 2025-12-04T12:25:14.2589755Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-067163aa862fde85.xml (deflated 90%) 2025-12-04T12:25:14.2591274Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-adf2403f35f3c235.xml (deflated 90%) 2025-12-04T12:25:14.2592780Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_clip_grad_norm/distributed.fsdp.test_fsdp_clip_grad_norm-36b91fd354097cab.xml (deflated 28%) 2025-12-04T12:25:14.2594210Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90a070d9a0caeaa7.xml (deflated 77%) 2025-12-04T12:25:14.2595556Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3b56b818e7dab969.xml (deflated 86%) 2025-12-04T12:25:14.2596909Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2da5f79ab7711605.xml (deflated 86%) 2025-12-04T12:25:14.2598259Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a202ac92fafcf85d.xml (deflated 77%) 2025-12-04T12:25:14.2599593Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bacdfd4e137b31c0.xml (deflated 86%) 2025-12-04T12:25:14.2600947Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2f84fddbafa0e0f3.xml (deflated 77%) 2025-12-04T12:25:14.2602277Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8511307d41418b77.xml (deflated 78%) 2025-12-04T12:25:14.2603651Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3768a5b2a44119fc.xml (deflated 78%) 2025-12-04T12:25:14.2605000Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-31ee953fde08a139.xml (deflated 78%) 2025-12-04T12:25:14.2606341Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cf0a0887fe85c292.xml (deflated 77%) 2025-12-04T12:25:14.2607668Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-07c27c95d6f3d3d6.xml (deflated 77%) 2025-12-04T12:25:14.2609011Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ec3b2535e8e2ad7.xml (deflated 77%) 2025-12-04T12:25:14.2610361Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2c7bc1bec56d6360.xml (deflated 86%) 2025-12-04T12:25:14.2611757Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1003ee713f2c1e3e.xml (deflated 77%) 2025-12-04T12:25:14.2613116Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-86ef8482fc5a0e9d.xml (deflated 77%) 2025-12-04T12:25:14.2614453Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e9238188d8477a2.xml (deflated 86%) 2025-12-04T12:25:14.2615787Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9476e56094f0b738.xml (deflated 77%) 2025-12-04T12:25:14.2617377Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-207ff9590d724b3a.xml (deflated 77%) 2025-12-04T12:25:14.2618752Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-f664e87214ff2805.xml (deflated 77%) 2025-12-04T12:25:14.2620132Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-def950b7d24ceea9.xml (deflated 77%) 2025-12-04T12:25:14.2621709Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-89dfbd7b5cd71317.xml (deflated 77%) 2025-12-04T12:25:14.2623099Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bdae057bafb686b9.xml (deflated 86%) 2025-12-04T12:25:14.2624480Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-eb4953947b5f3ef2.xml (deflated 77%) 2025-12-04T12:25:14.2625842Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-532f83d54e2054ff.xml (deflated 90%) 2025-12-04T12:25:14.2627222Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-3483d762b5b4fca1.xml (deflated 77%) 2025-12-04T12:25:14.2628617Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-c6b2032ef8ff1e94.xml (deflated 86%) 2025-12-04T12:25:14.2629997Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5647de3303d26f02.xml (deflated 77%) 2025-12-04T12:25:14.2631371Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cff7e7504b276d84.xml (deflated 86%) 2025-12-04T12:25:14.2632858Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d2fb83ab3ccdeb6.xml (deflated 77%) 2025-12-04T12:25:14.2634199Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-bd911142cc34300e.xml (deflated 90%) 2025-12-04T12:25:14.2635532Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-d8e84025a0dc7a16.xml (deflated 90%) 2025-12-04T12:25:14.2636931Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-392d2e7951c1c5f3.xml (deflated 77%) 2025-12-04T12:25:14.2638282Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-477ee10c9167da98.xml (deflated 90%) 2025-12-04T12:25:14.2639623Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-96eeb012f5f596ba.xml (deflated 77%) 2025-12-04T12:25:14.2640968Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc37fd9d84da442a.xml (deflated 77%) 2025-12-04T12:25:14.2642343Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cbd7e5f481e859be.xml (deflated 77%) 2025-12-04T12:25:14.2643684Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6ede249f1a681285.xml (deflated 77%) 2025-12-04T12:25:14.2645083Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-11be05c94e086d26.xml (deflated 77%) 2025-12-04T12:25:14.2646474Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-16966e8ed8e62900.xml (deflated 77%) 2025-12-04T12:25:14.2647814Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-90420efea6f00dc5.xml (deflated 77%) 2025-12-04T12:25:14.2649165Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6c9f36ab2b8b15ae.xml (deflated 77%) 2025-12-04T12:25:14.2650501Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-0d4c1fd96adc2be7.xml (deflated 86%) 2025-12-04T12:25:14.2651849Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-500277f28031837e.xml (deflated 77%) 2025-12-04T12:25:14.2653192Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-942d56c07e16c88d.xml (deflated 77%) 2025-12-04T12:25:14.2654534Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-55fdf9ad8e0a27f0.xml (deflated 77%) 2025-12-04T12:25:14.2655881Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e1cdaa245647d1a.xml (deflated 77%) 2025-12-04T12:25:14.2657467Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a996648fbbff19f5.xml (deflated 77%) 2025-12-04T12:25:14.2658851Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-cc1573489c80017b.xml (deflated 77%) 2025-12-04T12:25:14.2660240Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4d2b72d464b1c339.xml (deflated 78%) 2025-12-04T12:25:14.2661633Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-65dbafa4918c0ef1.xml (deflated 78%) 2025-12-04T12:25:14.2663009Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5b8e1f7dea233320.xml (deflated 90%) 2025-12-04T12:25:14.2664389Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d13641fc6f0b57c.xml (deflated 78%) 2025-12-04T12:25:14.2665768Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-29e66d82c97dbaa5.xml (deflated 78%) 2025-12-04T12:25:14.2667162Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-a798bbedf3e7b999.xml (deflated 90%) 2025-12-04T12:25:14.2668549Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e0d5d8a174cb3c98.xml (deflated 86%) 2025-12-04T12:25:14.2670054Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-931d013fb4c2579a.xml (deflated 90%) 2025-12-04T12:25:14.2671387Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-92646f491493cae0.xml (deflated 78%) 2025-12-04T12:25:14.2672721Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8232c23afc6466e0.xml (deflated 77%) 2025-12-04T12:25:14.2674056Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-983af60bcd722f1d.xml (deflated 77%) 2025-12-04T12:25:14.2675400Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-84ede3fbd174dfda.xml (deflated 77%) 2025-12-04T12:25:14.2676011Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9538bfd24f807d16.xml (deflated 86%) 2025-12-04T12:25:14.2676671Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6e7d2c56cd2be4bb.xml (deflated 86%) 2025-12-04T12:25:14.2677296Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1378f62336ac1630.xml (deflated 77%) 2025-12-04T12:25:14.2677910Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-8e092965a6aa7362.xml (deflated 86%) 2025-12-04T12:25:14.2678510Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-19aef0a0802c58a7.xml (deflated 77%) 2025-12-04T12:25:14.2679115Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-5e8c70689f4db333.xml (deflated 86%) 2025-12-04T12:25:14.2679704Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-389219a70e101b44.xml (deflated 77%) 2025-12-04T12:25:14.2680317Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22aad73f608511a0.xml (deflated 86%) 2025-12-04T12:25:14.2680912Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-22bb81621d944803.xml (deflated 77%) 2025-12-04T12:25:14.2681509Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-e70588b2995dc7c5.xml (deflated 77%) 2025-12-04T12:25:14.2682123Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-b456a18c8ca9135a.xml (deflated 77%) 2025-12-04T12:25:14.2682731Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-aedba904eee3ba73.xml (deflated 77%) 2025-12-04T12:25:14.2683342Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-2d3d36f137cb39b5.xml (deflated 77%) 2025-12-04T12:25:14.2683951Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-973a0dc84b27de93.xml (deflated 77%) 2025-12-04T12:25:14.2684554Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-1e9342b39aaf3792.xml (deflated 77%) 2025-12-04T12:25:14.2685162Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-15b775a41cf5a439.xml (deflated 77%) 2025-12-04T12:25:14.2685782Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-56374ffd8bd068de.xml (deflated 77%) 2025-12-04T12:25:14.2686392Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-6288913bb010f746.xml (deflated 77%) 2025-12-04T12:25:14.2687041Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-9d2350a2a3a63f23.xml (deflated 77%) 2025-12-04T12:25:14.2687689Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-ee9779088060e0f5.xml (deflated 86%) 2025-12-04T12:25:14.2688295Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-7a7aa8c4ec058e09.xml (deflated 77%) 2025-12-04T12:25:14.2688910Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_core/distributed.fsdp.test_fsdp_core-4f45a35aeec028b0.xml (deflated 28%) 2025-12-04T12:25:14.2704929Z adding: test/test-reports/python-pytest/distributed.algorithms.test_join/distributed.algorithms.test_join-346fdf8ca2d8d04c.xml (deflated 79%) 2025-12-04T12:25:14.2705844Z adding: test/test-reports/python-pytest/distributed.pipelining.test_schedule_multiproc/distributed.pipelining.test_schedule_multiproc-4c892aab54fe07b4.xml (deflated 88%) 2025-12-04T12:25:14.2706584Z adding: test/test-reports/python-pytest/distributed.test_compute_comm_reordering/distributed.test_compute_comm_reordering-5eeb11f30d43fbd8.xml (deflated 78%) 2025-12-04T12:25:14.2707325Z adding: test/test-reports/python-pytest/distributed.test_cupy_as_tensor/distributed.test_cupy_as_tensor-9bf0be6a7af397ad.xml (deflated 47%) 2025-12-04T12:25:14.2707984Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_fx/distributed.fsdp.test_fsdp_fx-d8b89ec57f22953e.xml (deflated 35%) 2025-12-04T12:25:14.2708593Z adding: test/test-reports/python-pytest/distributed._tools.test_sac_ilp/distributed._tools.test_sac_ilp-80280b96b0e30cba.xml (deflated 66%) 2025-12-04T12:25:14.2709412Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_hf_storage/distributed.checkpoint.test_hf_storage-5c05eca826b12737.xml (deflated 70%) 2025-12-04T12:25:14.2710118Z adding: test/test-reports/python-pytest/distributed.pipelining.test_microbatch/distributed.pipelining.test_microbatch-db2f7f262044cd4d.xml (deflated 58%) 2025-12-04T12:25:14.2710829Z adding: test/test-reports/python-pytest/distributed.tensor.test_placement_types/distributed.tensor.test_placement_types-aa6a82bf337fac31.xml (deflated 70%) 2025-12-04T12:25:14.2711624Z adding: test/test-reports/python-pytest/distributed.tensor.test_dtensor_dispatch_overhead/distributed.tensor.test_dtensor_dispatch_overhead-1be227e0f3a4b8ca.xml (deflated 41%) 2025-12-04T12:25:14.2712538Z adding: test/test-reports/python-pytest/distributed.checkpoint._experimental.test_checkpoint_reader/distributed.checkpoint._experimental.test_checkpoint_reader-e75c494c472cf9a1.xml (deflated 67%) 2025-12-04T12:25:14.2713260Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_format_utils/distributed.checkpoint.test_format_utils-ff4efe8ffc0a39b9.xml (deflated 59%) 2025-12-04T12:25:14.2713997Z adding: test/test-reports/python-pytest/distributed.test_aten_comm_compute_reordering/distributed.test_aten_comm_compute_reordering-8ab49fa352932ba1.xml (deflated 84%) 2025-12-04T12:25:14.2714678Z adding: test/test-reports/python-pytest/distributed.tensor.test_redistribute/distributed.tensor.test_redistribute-02b614c0805e2900.xml (deflated 86%) 2025-12-04T12:25:14.2715404Z adding: test/test-reports/python-pytest/distributed.tensor.parallel.test_tp_style/distributed.tensor.parallel.test_tp_style-3daa17d4beb2059f.xml (deflated 82%) 2025-12-04T12:25:14.2715986Z adding: test/test-reports/python-pytest/distributed.tensor.test_api/distributed.tensor.test_api-143a55cc9757e18a.xml (deflated 82%) 2025-12-04T12:25:14.2716642Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_fsspec/distributed.checkpoint.test_fsspec-2295d11b632387c0.xml (deflated 54%) 2025-12-04T12:25:14.2717457Z adding: test/test-reports/python-pytest/distributed.tensor.experimental.test_tp_transform/distributed.tensor.experimental.test_tp_transform-af912528cabb656d.xml (deflated 62%) 2025-12-04T12:25:14.2718145Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_traverse/distributed.checkpoint.test_traverse-f038bc92a00bd1c7.xml (deflated 75%) 2025-12-04T12:25:14.2718823Z adding: test/test-reports/python-pytest/distributed.tensor.test_random_ops/distributed.tensor.test_random_ops-a8f6b522aa6434af.xml (deflated 86%) 2025-12-04T12:25:14.2719650Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_logging/distributed._composable.fsdp.test_fully_shard_logging-7e09cae3d59aa65e.xml (deflated 27%) 2025-12-04T12:25:14.2720249Z adding: test/test-reports/python-pytest/distributed.launcher.test_api/distributed.launcher.test_api-15b87ceaa10651c5.xml (deflated 51%) 2025-12-04T12:25:14.2721384Z adding: test/test-reports/python-pytest/distributed.elastic.multiprocessing.test_api/distributed.elastic.multiprocessing.test_api-12b95803d8942f3a.xml (deflated 75%) 2025-12-04T12:25:14.2722040Z adding: test/test-reports/python-pytest/distributed.fsdp.test_shard_utils/distributed.fsdp.test_shard_utils-76ee73cffd398e77.xml (deflated 52%) 2025-12-04T12:25:14.2722811Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_fsdp_optim_state/distributed.checkpoint.test_fsdp_optim_state-f29e492ac7e0fdff.xml (deflated 55%) 2025-12-04T12:25:14.2723726Z adding: test/test-reports/python-pytest/distributed.checkpoint.e2e.test_e2e_save_and_load/distributed.checkpoint.e2e.test_e2e_save_and_load-ea436a2b3918b4b7.xml (deflated 85%) 2025-12-04T12:25:14.2724565Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_dtensor_resharding/distributed.checkpoint.test_dtensor_resharding-850e82d898db0167.xml (deflated 80%) 2025-12-04T12:25:14.2725216Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_memory/distributed.fsdp.test_fsdp_memory-bd1d93d0f6b45624.xml (deflated 53%) 2025-12-04T12:25:14.2725912Z adding: test/test-reports/python-pytest/distributed.tensor.test_pointwise_ops/distributed.tensor.test_pointwise_ops-8ffd5e5eb5f5ad7d.xml (deflated 84%) 2025-12-04T12:25:14.2726668Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_compatibility/distributed.checkpoint.test_compatibility-759684b03ee5bd2d.xml (deflated 75%) 2025-12-04T12:25:14.2727342Z adding: test/test-reports/python-pytest/distributed._tools.test_mem_tracker/distributed._tools.test_mem_tracker-e6bb23aea30c734a.xml (deflated 58%) 2025-12-04T12:25:14.2728046Z adding: test/test-reports/python-pytest/distributed.elastic.test_control_plane/distributed.elastic.test_control_plane-8adada293373a225.xml (deflated 74%) 2025-12-04T12:25:14.2728601Z adding: test/test-reports/python-pytest/distributed.test_fake_pg/distributed.test_fake_pg-79e3fe3f86c7485d.xml (deflated 82%) 2025-12-04T12:25:14.2729370Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_fsdp_model_state/distributed.checkpoint.test_fsdp_model_state-d2d7dab49696755b.xml (deflated 55%) 2025-12-04T12:25:14.2730077Z adding: test/test-reports/python-pytest/distributed.test_functional_api/distributed.test_functional_api-d3092064f68d2f41.xml (deflated 78%) 2025-12-04T12:25:14.2730984Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_clip_grad_norm_/distributed._composable.fsdp.test_fully_shard_clip_grad_norm_-2322cac9c0cc490f.xml (deflated 52%) 2025-12-04T12:25:14.2731706Z adding: test/test-reports/python-pytest/distributed.tensor.debug.test_comm_mode/distributed.tensor.debug.test_comm_mode-8cc829f047ed6143.xml (deflated 66%) 2025-12-04T12:25:14.2732241Z adding: test/test-reports/python-pytest/distributed.test_dist2/distributed.test_dist2-7a48db8512284abb.xml (deflated 89%) 2025-12-04T12:25:14.2733237Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_grad_scaler/distributed._composable.fsdp.test_fully_shard_grad_scaler-5e3c33eaf29838b0.xml (deflated 37%) 2025-12-04T12:25:14.2733844Z adding: test/test-reports/python-pytest/distributed.launcher.test_run/distributed.launcher.test_run-eeaaeb50473e3b00.xml (deflated 84%) 2025-12-04T12:25:14.2734583Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_backward_prefetch/distributed.fsdp.test_fsdp_backward_prefetch-9d6c65a3bd838e6b.xml (deflated 39%) 2025-12-04T12:25:14.2735327Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_checkpoint/distributed.checkpoint.test_checkpoint-698955a0be6378e2.xml (deflated 77%) 2025-12-04T12:25:14.2735948Z adding: test/test-reports/python-pytest/distributed._pycute.test_coalesce/distributed._pycute.test_coalesce-d2727b6d77166552.xml (deflated 38%) 2025-12-04T12:25:14.2736677Z adding: test/test-reports/python-pytest/distributed._pycute.test_complement/distributed._pycute.test_complement-323506218bd25d4f.xml (deflated 39%) 2025-12-04T12:25:14.2737521Z adding: test/test-reports/python-pytest/distributed._pycute.test_composition/distributed._pycute.test_composition-91e42d2ac7610498.xml (deflated 40%) 2025-12-04T12:25:14.2738167Z adding: test/test-reports/python-pytest/distributed._pycute.test_int_tuple/distributed._pycute.test_int_tuple-1604350619512e65.xml (deflated 82%) 2025-12-04T12:25:14.2738838Z adding: test/test-reports/python-pytest/distributed._pycute.test_left_inverse/distributed._pycute.test_left_inverse-7b550f03a54828f5.xml (deflated 38%) 2025-12-04T12:25:14.2739606Z adding: test/test-reports/python-pytest/distributed._pycute.test_right_inverse/distributed._pycute.test_right_inverse-5437f0847845b913.xml (deflated 38%) 2025-12-04T12:25:14.2740377Z adding: test/test-reports/python-pytest/distributed._composable.test_replicate/distributed._composable.test_replicate-5594e5fd77ce79b5.xml (deflated 85%) 2025-12-04T12:25:14.2741147Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_hsdp_checkpoint/distributed.checkpoint.test_hsdp_checkpoint-293bcc74b378a9a0.xml (deflated 70%) 2025-12-04T12:25:14.2741977Z adding: test/test-reports/python-pytest/distributed.tensor.parallel.test_parallelize_api/distributed.tensor.parallel.test_parallelize_api-e24bc2790e3eed77.xml (deflated 89%) 2025-12-04T12:25:14.2742648Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_state_dict/distributed.fsdp.test_fsdp_state_dict-3c13b82ce7076bc1.xml (deflated 95%) 2025-12-04T12:25:14.2743290Z adding: test/test-reports/python-pytest/distributed._pycute.test_typing/distributed._pycute.test_typing-1c9aabc95fed14a1.xml (deflated 39%) 2025-12-04T12:25:14.2743926Z adding: test/test-reports/python-pytest/distributed.test_serialization/distributed.test_serialization-5c3790edbaae9c6a.xml (deflated 70%) 2025-12-04T12:25:14.2744657Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_ignored_modules/distributed.fsdp.test_fsdp_ignored_modules-c4ab0979e06883a2.xml (deflated 78%) 2025-12-04T12:25:14.2745482Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_comm/distributed._composable.fsdp.test_fully_shard_comm-b03b971b17f9f8be.xml (deflated 82%) 2025-12-04T12:25:14.2746249Z adding: test/test-reports/python-pytest/distributed.fsdp.test_fsdp_sharded_grad_scaler/distributed.fsdp.test_fsdp_sharded_grad_scaler-830facc45336217a.xml (deflated 90%) 2025-12-04T12:25:14.2747080Z adding: test/test-reports/python-pytest/distributed._shard.sharding_plan.test_sharding_plan/distributed._shard.sharding_plan.test_sharding_plan-86fe0d16a378ac71.xml (deflated 62%) 2025-12-04T12:25:14.2747904Z adding: test/test-reports/python-pytest/distributed._shard.sharded_optim.test_sharded_optim/distributed._shard.sharded_optim.test_sharded_optim-a8d576a6cb5a21e5.xml (deflated 54%) 2025-12-04T12:25:14.2748886Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_state_dict/distributed._composable.fsdp.test_fully_shard_state_dict-7cd1746803ec2a8b.xml (deflated 77%) 2025-12-04T12:25:14.2749483Z adding: test/test-reports/python-pytest/distributed.tensor.test_utils/distributed.tensor.test_utils-ce4dc3e67348c080.xml (deflated 82%) 2025-12-04T12:25:14.2750286Z adding: test/test-reports/python-pytest/distributed._composable.fsdp.test_fully_shard_memory/distributed._composable.fsdp.test_fully_shard_memory-bd84ca434b9abee9.xml (deflated 54%) 2025-12-04T12:25:14.2751012Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_state_dict/distributed.checkpoint.test_state_dict-82ab38e24fe889c8.xml (deflated 84%) 2025-12-04T12:25:14.2751759Z adding: test/test-reports/python-pytest/distributed.checkpoint.test_state_dict_utils/distributed.checkpoint.test_state_dict_utils-a19642af8d31d778.xml (deflated 75%) 2025-12-04T12:25:14.2752567Z adding: test/test-reports/python-pytest/distributed._shard.sharded_tensor.ops.test_embedding/distributed._shard.sharded_tensor.ops.test_embedding-fd33e5d9c41f35fb.xml (deflated 55%) 2025-12-04T12:25:14.2753452Z adding: test/test-reports/python-pytest/distributed._shard.sharded_tensor.test_sharded_tensor_reshard/distributed._shard.sharded_tensor.test_sharded_tensor_reshard-e6bc79067fb0604d.xml (deflated 59%) 2025-12-04T12:25:14.2754068Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-2ef4942791579d03.xml (deflated 35%) 2025-12-04T12:25:14.2754671Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-d882aa7ed351d2b7.xml (deflated 35%) 2025-12-04T12:25:14.2755327Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-e41d47243c13be74.xml (deflated 35%) 2025-12-04T12:25:14.2755963Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-2ed2ccb680132309.xml (deflated 36%) 2025-12-04T12:25:14.2756566Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-a86d7398eb9ff93b.xml (deflated 36%) 2025-12-04T12:25:14.2757178Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-50f191d4627fdfd2.xml (deflated 36%) 2025-12-04T12:25:14.2757775Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-8cb70355957e1b4b.xml (deflated 36%) 2025-12-04T12:25:14.2758385Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-bbde3500be39702b.xml (deflated 35%) 2025-12-04T12:25:14.2758993Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-1805de606cf78685.xml (deflated 35%) 2025-12-04T12:25:14.2759594Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_nccl/distributed.test_c10d_spawn_nccl-8a898c87fa4f8fd3.xml (deflated 35%) 2025-12-04T12:25:14.2760189Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-41764b12ccdf212e.xml (deflated 45%) 2025-12-04T12:25:14.2760776Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-aee5aa2ded024d85.xml (deflated 46%) 2025-12-04T12:25:14.2761359Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-8800a2e7b955ab16.xml (deflated 46%) 2025-12-04T12:25:14.2761956Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-3a092f5472894a7f.xml (deflated 45%) 2025-12-04T12:25:14.2762544Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-f628509e7e3f2a1f.xml (deflated 45%) 2025-12-04T12:25:14.2763141Z adding: test/test-reports/python-pytest/distributed.test_c10d_spawn_ucc/distributed.test_c10d_spawn_ucc-c1a78b733abc6caa.xml (deflated 45%) 2025-12-04T12:25:14.2763686Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0991bf72558fb22b.xml (deflated 33%) 2025-12-04T12:25:14.2764249Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-aa6ce215ba96a24c.xml (deflated 37%) 2025-12-04T12:25:14.2764790Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-16fe1d620732710b.xml (deflated 35%) 2025-12-04T12:25:14.2765365Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3fe1795a5d3e5b88.xml (deflated 35%) 2025-12-04T12:25:14.2765923Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6c7276bb9fa9eee2.xml (deflated 35%) 2025-12-04T12:25:14.2766466Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cd50578f9742b761.xml (deflated 35%) 2025-12-04T12:25:14.2767019Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5e60172a210dc8b6.xml (deflated 35%) 2025-12-04T12:25:14.2767561Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-873ae68d43267ac9.xml (deflated 35%) 2025-12-04T12:25:14.2768105Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-34c50e4612c9fea4.xml (deflated 35%) 2025-12-04T12:25:14.2768659Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d54fb6be7a931b62.xml (deflated 35%) 2025-12-04T12:25:14.2769260Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2259b8bd184524fc.xml (deflated 35%) 2025-12-04T12:25:14.2769841Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8f01caa16144b040.xml (deflated 35%) 2025-12-04T12:25:14.2770383Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-31de274c3cb59c01.xml (deflated 35%) 2025-12-04T12:25:14.2770920Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-db19637423ab0dbc.xml (deflated 36%) 2025-12-04T12:25:14.2771476Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b23ea90304491b65.xml (deflated 35%) 2025-12-04T12:25:14.2772018Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eaee01f734bb6504.xml (deflated 35%) 2025-12-04T12:25:14.2772577Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0fa860b184f8ddb6.xml (deflated 35%) 2025-12-04T12:25:14.2773124Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-33cbbe588c8f840c.xml (deflated 36%) 2025-12-04T12:25:14.2773666Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-de8dc85b62067611.xml (deflated 35%) 2025-12-04T12:25:14.2774217Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0f2cd4f378b677f0.xml (deflated 35%) 2025-12-04T12:25:14.2774753Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e35b0454119a9f51.xml (deflated 35%) 2025-12-04T12:25:14.2775299Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d98cd20152af5d53.xml (deflated 35%) 2025-12-04T12:25:14.2775841Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3982ee850d6ce795.xml (deflated 35%) 2025-12-04T12:25:14.2776474Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-08455987c8f710af.xml (deflated 35%) 2025-12-04T12:25:14.2777205Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e90446a7a06b5b78.xml (deflated 36%) 2025-12-04T12:25:14.2777763Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3abd929020861bdc.xml (deflated 36%) 2025-12-04T12:25:14.2778334Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d79cb42da7e54a79.xml (deflated 36%) 2025-12-04T12:25:14.2778893Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1a14244d1e7f6bb2.xml (deflated 36%) 2025-12-04T12:25:14.2779464Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a80b6bac28c5c972.xml (deflated 35%) 2025-12-04T12:25:14.2780067Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bf45f3c093461361.xml (deflated 36%) 2025-12-04T12:25:14.2780630Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-81160b788c5abcc2.xml (deflated 35%) 2025-12-04T12:25:14.2781190Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2242d642afc7f886.xml (deflated 35%) 2025-12-04T12:25:14.2781746Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-327f840cbb3f5094.xml (deflated 37%) 2025-12-04T12:25:14.2782313Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-724f786ab432a45b.xml (deflated 36%) 2025-12-04T12:25:14.2782872Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-aae15a76989ce46a.xml (deflated 36%) 2025-12-04T12:25:14.2783434Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ee273f849859fe9.xml (deflated 36%) 2025-12-04T12:25:14.2784053Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93baf128de560649.xml (deflated 36%) 2025-12-04T12:25:14.2784641Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1f85ec05eddb726d.xml (deflated 36%) 2025-12-04T12:25:14.2785205Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c9eb752317a73e18.xml (deflated 36%) 2025-12-04T12:25:14.2785761Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-cedb520e520b4782.xml (deflated 36%) 2025-12-04T12:25:14.2786325Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e69dd1a2e9fba2dc.xml (deflated 36%) 2025-12-04T12:25:14.2786885Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-47c9021380160661.xml (deflated 36%) 2025-12-04T12:25:14.2787449Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-681adc1d59f04282.xml (deflated 36%) 2025-12-04T12:25:14.2788010Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-1755a27e81246495.xml (deflated 37%) 2025-12-04T12:25:14.2788563Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b2036226275eb311.xml (deflated 36%) 2025-12-04T12:25:14.2789218Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3f50e0fff8c24c86.xml (deflated 37%) 2025-12-04T12:25:14.2789767Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d908f57090f2acd6.xml (deflated 37%) 2025-12-04T12:25:14.2790313Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-ac7a92e764fd2c8b.xml (deflated 36%) 2025-12-04T12:25:14.2790874Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2f80e6d84c47c0a7.xml (deflated 36%) 2025-12-04T12:25:14.2791416Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2042e0d50243da8a.xml (deflated 36%) 2025-12-04T12:25:14.2791966Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bb9adcd8663666ac.xml (deflated 36%) 2025-12-04T12:25:14.2792511Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-246370ceca8d8d8b.xml (deflated 37%) 2025-12-04T12:25:14.2793057Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f75c8f9699a93e6a.xml (deflated 36%) 2025-12-04T12:25:14.2793600Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-830d90348309a50c.xml (deflated 36%) 2025-12-04T12:25:14.2794171Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-257d76299fdbf250.xml (deflated 36%) 2025-12-04T12:25:14.2794722Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fa0b0b810d894be9.xml (deflated 36%) 2025-12-04T12:25:14.2795279Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b713da153aca8219.xml (deflated 37%) 2025-12-04T12:25:14.2795825Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-812da336a80f282a.xml (deflated 33%) 2025-12-04T12:25:14.2796374Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2be07987a59e5da5.xml (deflated 34%) 2025-12-04T12:25:14.2796916Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0d952f420fed2de5.xml (deflated 33%) 2025-12-04T12:25:14.2797469Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d29bf39728651f67.xml (deflated 34%) 2025-12-04T12:25:14.2798079Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-01e88d26c5e6aa85.xml (deflated 34%) 2025-12-04T12:25:14.2798644Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-25efe3194372b4e6.xml (deflated 34%) 2025-12-04T12:25:14.2799194Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ccf063a53847c36.xml (deflated 34%) 2025-12-04T12:25:14.2799736Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-72be92db0e827d7f.xml (deflated 34%) 2025-12-04T12:25:14.2800288Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-84f86de4e3aa962a.xml (deflated 34%) 2025-12-04T12:25:14.2800831Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e5c4d09fb827cb7f.xml (deflated 34%) 2025-12-04T12:25:14.2801379Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-165d83ae78886ff8.xml (deflated 35%) 2025-12-04T12:25:14.2801937Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-76f6fcd9346eff0a.xml (deflated 34%) 2025-12-04T12:25:14.2802479Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e84bdf3d05666f91.xml (deflated 34%) 2025-12-04T12:25:14.2803033Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-a357bf2b1c694c62.xml (deflated 34%) 2025-12-04T12:25:14.2803577Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b1b5f73bcb8b828f.xml (deflated 34%) 2025-12-04T12:25:14.2804119Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e742397162ed9e3d.xml (deflated 34%) 2025-12-04T12:25:14.2804679Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f3a1c05a7b5c0fa8.xml (deflated 34%) 2025-12-04T12:25:14.2805228Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fcd37833b58d4bea.xml (deflated 34%) 2025-12-04T12:25:14.2805786Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-e22bb2e46b3ab636.xml (deflated 34%) 2025-12-04T12:25:14.2806324Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d319014b034c95bf.xml (deflated 34%) 2025-12-04T12:25:14.2806871Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-393bf6208ab91711.xml (deflated 34%) 2025-12-04T12:25:14.2807420Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bb9e40b9771000a0.xml (deflated 34%) 2025-12-04T12:25:14.2807963Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d597ca27d8328fc4.xml (deflated 34%) 2025-12-04T12:25:14.2808548Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-ff18cf4d50e44f39.xml (deflated 34%) 2025-12-04T12:25:14.2809092Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0be906a8969ec101.xml (deflated 34%) 2025-12-04T12:25:14.2809637Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-158f1ad05ae2a64b.xml (deflated 34%) 2025-12-04T12:25:14.2810195Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-87453a67a1ebaea6.xml (deflated 34%) 2025-12-04T12:25:14.2810736Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-94f3fac53aec8990.xml (deflated 34%) 2025-12-04T12:25:14.2811284Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93576123b2405b32.xml (deflated 35%) 2025-12-04T12:25:14.2811823Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f6666d1683ab3f1d.xml (deflated 34%) 2025-12-04T12:25:14.2812418Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-54b039aca43fe5b7.xml (deflated 34%) 2025-12-04T12:25:14.2812999Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8eea24e340cd482b.xml (deflated 34%) 2025-12-04T12:25:14.2813543Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-abf845b544fb7d20.xml (deflated 35%) 2025-12-04T12:25:14.2814095Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f27d8d563aeff333.xml (deflated 34%) 2025-12-04T12:25:14.2814638Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b98a8d5dfa728efd.xml (deflated 35%) 2025-12-04T12:25:14.2815187Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f9a146a8fac2af4d.xml (deflated 35%) 2025-12-04T12:25:14.2815732Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d8bb6ca9e3ae378b.xml (deflated 34%) 2025-12-04T12:25:14.2816277Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-604db34ae5cbb6b2.xml (deflated 34%) 2025-12-04T12:25:14.2817070Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6d6d34df2e34630b.xml (deflated 35%) 2025-12-04T12:25:14.2817666Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-520dfe050df69b4b.xml (deflated 35%) 2025-12-04T12:25:14.2818234Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2074cd035f8dc8fc.xml (deflated 35%) 2025-12-04T12:25:14.2818791Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-468dffdf4603fb37.xml (deflated 35%) 2025-12-04T12:25:14.2819353Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-fb8500504162f453.xml (deflated 35%) 2025-12-04T12:25:14.2819921Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-56d2f4c749889dbc.xml (deflated 35%) 2025-12-04T12:25:14.2820484Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8cef0d6061a45be8.xml (deflated 34%) 2025-12-04T12:25:14.2821233Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-93d1d438aff7bb95.xml (deflated 35%) 2025-12-04T12:25:14.2821793Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-5c11159a66fb94a9.xml (deflated 35%) 2025-12-04T12:25:14.2822359Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-c1ea079cea0d8e56.xml (deflated 35%) 2025-12-04T12:25:14.2823070Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-f25b64af298ca601.xml (deflated 35%) 2025-12-04T12:25:14.2823631Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-87383ac3904bfe89.xml (deflated 35%) 2025-12-04T12:25:14.2824201Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d793a1fedd0d4f15.xml (deflated 35%) 2025-12-04T12:25:14.2824754Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-b67795a049190b1d.xml (deflated 34%) 2025-12-04T12:25:14.2825311Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-bde1923c97f63381.xml (deflated 35%) 2025-12-04T12:25:14.2825869Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2540c713fc68453d.xml (deflated 35%) 2025-12-04T12:25:14.2826440Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-8d1d058689da62ff.xml (deflated 47%) 2025-12-04T12:25:14.2827082Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0c93a8978347968a.xml (deflated 35%) 2025-12-04T12:25:14.2827678Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-18641772917d69fc.xml (deflated 34%) 2025-12-04T12:25:14.2828250Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-6a77c9a2c337df36.xml (deflated 35%) 2025-12-04T12:25:14.2828817Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-25efbb19e469ebb7.xml (deflated 34%) 2025-12-04T12:25:14.2829377Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-eac363af2c24f931.xml (deflated 35%) 2025-12-04T12:25:14.2829948Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-33bf8b4540a40636.xml (deflated 35%) 2025-12-04T12:25:14.2830514Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-45778cf420dbd19f.xml (deflated 36%) 2025-12-04T12:25:14.2831091Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-7dfffc535a3e90f1.xml (deflated 36%) 2025-12-04T12:25:14.2831652Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4b2795b0e7efac26.xml (deflated 36%) 2025-12-04T12:25:14.2832208Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-2b369bec34855654.xml (deflated 36%) 2025-12-04T12:25:14.2832887Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-d6b15d261538e27e.xml (deflated 35%) 2025-12-04T12:25:14.2833432Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-4ef76d7bc1711751.xml (deflated 35%) 2025-12-04T12:25:14.2833981Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-0343427a5558824f.xml (deflated 33%) 2025-12-04T12:25:14.2834535Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-3f70a63e56a4848b.xml (deflated 34%) 2025-12-04T12:25:14.2835081Z adding: test/test-reports/python-pytest/distributed.test_c10d_gloo/distributed.test_c10d_gloo-821ac567b5ed63bc.xml (deflated 34%) 2025-12-04T12:25:14.2835904Z adding: test/test-reports/python-pytest/distributed._shard.sharded_tensor.test_sharded_tensor/distributed._shard.sharded_tensor.test_sharded_tensor-ae33be926ad38292.xml (deflated 91%) 2025-12-04T12:25:14.2836449Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4e483f68cef17162.xml (deflated 33%) 2025-12-04T12:25:14.2836999Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-05f5b130753b2983.xml (deflated 34%) 2025-12-04T12:25:14.2837576Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7e16e53ef8db6995.xml (deflated 35%) 2025-12-04T12:25:14.2838122Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1e281dcef1930575.xml (deflated 35%) 2025-12-04T12:25:14.2838677Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2b466e71a200bcdc.xml (deflated 34%) 2025-12-04T12:25:14.2839218Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-325c8a002e1c83a2.xml (deflated 49%) 2025-12-04T12:25:14.2839772Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c0b6a576b76efd0.xml (deflated 34%) 2025-12-04T12:25:14.2840317Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e47f2e15272edbaf.xml (deflated 34%) 2025-12-04T12:25:14.2840877Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a9e19469eb1a06d4.xml (deflated 36%) 2025-12-04T12:25:14.2841473Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-df7444533096a1d8.xml (deflated 34%) 2025-12-04T12:25:14.2842055Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d87d87bc823f3dba.xml (deflated 34%) 2025-12-04T12:25:14.2842611Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4a50a5ac8cd03017.xml (deflated 36%) 2025-12-04T12:25:14.2843158Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0ae50f0e1c874ad8.xml (deflated 34%) 2025-12-04T12:25:14.2843718Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7dbf8411ea4b6ce3.xml (deflated 35%) 2025-12-04T12:25:14.2844261Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2a6114c53cde50d7.xml (deflated 34%) 2025-12-04T12:25:14.2844804Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d109d91d9cd820a7.xml (deflated 34%) 2025-12-04T12:25:14.2845363Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7e589af2daee12d3.xml (deflated 34%) 2025-12-04T12:25:14.2845905Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ff536a30913e6717.xml (deflated 36%) 2025-12-04T12:25:14.2846459Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-16e8bb0ec51136f2.xml (deflated 36%) 2025-12-04T12:25:14.2847012Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-688fcf4f5f0deff2.xml (deflated 36%) 2025-12-04T12:25:14.2847559Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-c2f4984a060c2ce4.xml (deflated 37%) 2025-12-04T12:25:14.2848119Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4874c9e324e6599b.xml (deflated 36%) 2025-12-04T12:25:14.2848665Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-81b232fd98a6eda2.xml (deflated 35%) 2025-12-04T12:25:14.2849232Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-dbedd4dfa730b471.xml (deflated 36%) 2025-12-04T12:25:14.2849776Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-e94fe5aed063a3e7.xml (deflated 35%) 2025-12-04T12:25:14.2850314Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-191142456fb777f7.xml (deflated 36%) 2025-12-04T12:25:14.2850870Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d909bdccb7ddf2c0.xml (deflated 36%) 2025-12-04T12:25:14.2851408Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2e3a4388e42e1415.xml (deflated 36%) 2025-12-04T12:25:14.2852013Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c5f42a263385a17.xml (deflated 38%) 2025-12-04T12:25:14.2852553Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a6537375079d62ca.xml (deflated 36%) 2025-12-04T12:25:14.2853092Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-515a3b961a30c93e.xml (deflated 36%) 2025-12-04T12:25:14.2853639Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-247b406154c62e2b.xml (deflated 37%) 2025-12-04T12:25:14.2854178Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-54fc92777b10ce8b.xml (deflated 35%) 2025-12-04T12:25:14.2854735Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-07a5e82fccbcefb0.xml (deflated 36%) 2025-12-04T12:25:14.2855275Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-98372eb164ddb8a6.xml (deflated 37%) 2025-12-04T12:25:14.2855874Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9a91f2cdfa9f567b.xml (deflated 36%) 2025-12-04T12:25:14.2856527Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-578f1554447ed157.xml (deflated 36%) 2025-12-04T12:25:14.2857248Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-cba9e46262707896.xml (deflated 36%) 2025-12-04T12:25:14.2857826Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-b5cc6836ef1a3879.xml (deflated 35%) 2025-12-04T12:25:14.2858387Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1a086feba79f79de.xml (deflated 37%) 2025-12-04T12:25:14.2858964Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-fd712f2413b91025.xml (deflated 35%) 2025-12-04T12:25:14.2859524Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-2e275020a83607d9.xml (deflated 45%) 2025-12-04T12:25:14.2860080Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-32cb996256d67719.xml (deflated 49%) 2025-12-04T12:25:14.2860652Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-281110f64c593b33.xml (deflated 35%) 2025-12-04T12:25:14.2861216Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-ab551cc6e4b8fc0e.xml (deflated 35%) 2025-12-04T12:25:14.2861789Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-bb4b38110c51be7b.xml (deflated 36%) 2025-12-04T12:25:14.2862352Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-d76cceb106b5a87a.xml (deflated 35%) 2025-12-04T12:25:14.2862920Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f5087c7fb2c85ea4.xml (deflated 35%) 2025-12-04T12:25:14.2863499Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5bf92e22e16000ae.xml (deflated 37%) 2025-12-04T12:25:14.2864064Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-a2df2e6eff7daa02.xml (deflated 33%) 2025-12-04T12:25:14.2864634Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-62cf8d48558e6611.xml (deflated 48%) 2025-12-04T12:25:14.2865192Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-008b4e727f5be082.xml (deflated 33%) 2025-12-04T12:25:14.2865753Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0b38d08cedf93968.xml (deflated 34%) 2025-12-04T12:25:14.2866358Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-0615767c47cb824b.xml (deflated 35%) 2025-12-04T12:25:14.2867026Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3a85b82e41e52e7b.xml (deflated 35%) 2025-12-04T12:25:14.2867595Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-670c4eb9ad8ac35a.xml (deflated 34%) 2025-12-04T12:25:14.2868156Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1ae993f40739468a.xml (deflated 34%) 2025-12-04T12:25:14.2868716Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-1379655e313056b3.xml (deflated 36%) 2025-12-04T12:25:14.2869383Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-17d32ccc8ec15e49.xml (deflated 35%) 2025-12-04T12:25:14.2869932Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-3c5afe3c6d472874.xml (deflated 34%) 2025-12-04T12:25:14.2870699Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-71d8c77dbd2b6cd3.xml (deflated 35%) 2025-12-04T12:25:14.2871299Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-9e93da4b49ea34dc.xml (deflated 34%) 2025-12-04T12:25:14.2871847Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-09fe633d76933c88.xml (deflated 34%) 2025-12-04T12:25:14.2872396Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-4db84368319deb77.xml (deflated 35%) 2025-12-04T12:25:14.2872974Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-867c58ec01067ba4.xml (deflated 35%) 2025-12-04T12:25:14.2873540Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-f4ea20dbc7c23240.xml (deflated 35%) 2025-12-04T12:25:14.2874092Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-197b01c054eb8425.xml (deflated 33%) 2025-12-04T12:25:14.2874652Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5f78ef08e5f67618.xml (deflated 35%) 2025-12-04T12:25:14.2875198Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-5dd09e666c5e73ac.xml (deflated 35%) 2025-12-04T12:25:14.2875740Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-8d5b24102af3938b.xml (deflated 35%) 2025-12-04T12:25:14.2876312Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-7ed88178415e82af.xml (deflated 34%) 2025-12-04T12:25:14.2876861Z adding: test/test-reports/python-pytest/distributed.test_c10d_nccl/distributed.test_c10d_nccl-17ddadec6a584fc8.xml (deflated 34%) 2025-12-04T12:25:14.2877527Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-db161ee1d414a014.xml (deflated 28%) 2025-12-04T12:25:14.2878184Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aee66205f8817bd7.xml (deflated 28%) 2025-12-04T12:25:14.2878848Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f4fea7b2e6cf3a65.xml (deflated 28%) 2025-12-04T12:25:14.2879509Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-422b22169e3a08f1.xml (deflated 28%) 2025-12-04T12:25:14.2880161Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ec15082b412f697.xml (deflated 27%) 2025-12-04T12:25:14.2880817Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a2eda26248d83b8e.xml (deflated 28%) 2025-12-04T12:25:14.2881504Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e12df5e946a2399b.xml (deflated 27%) 2025-12-04T12:25:14.2882157Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4ab25792bd6780ce.xml (deflated 28%) 2025-12-04T12:25:14.2882811Z adding: test/test-reports/dist-mpi-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee61fca4ae363844.xml (deflated 28%) 2025-12-04T12:25:14.2883465Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e43b258f943c7149.xml (deflated 28%) 2025-12-04T12:25:14.2884125Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ed8ce545db3785b0.xml (deflated 28%) 2025-12-04T12:25:14.2884781Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-51bd71d27c2db4f0.xml (deflated 28%) 2025-12-04T12:25:14.2885494Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-72f602b330e606cb.xml (deflated 28%) 2025-12-04T12:25:14.2886172Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-94537227bc12f698.xml (deflated 28%) 2025-12-04T12:25:14.2886822Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f7368dd24235350f.xml (deflated 28%) 2025-12-04T12:25:14.2887482Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-12e19ecac0707a9f.xml (deflated 28%) 2025-12-04T12:25:14.2888134Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-49aeb17bc0069227.xml (deflated 28%) 2025-12-04T12:25:14.2888800Z adding: test/test-reports/dist-mpi-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-82678a9127d50625.xml (deflated 28%) 2025-12-04T12:25:14.2889459Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eeb723e5683986dd.xml (deflated 35%) 2025-12-04T12:25:14.2890115Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7dd0923a385a5b44.xml (deflated 44%) 2025-12-04T12:25:14.2890778Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-875b3394fe6124ff.xml (deflated 36%) 2025-12-04T12:25:14.2891427Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a01719010801f0eb.xml (deflated 36%) 2025-12-04T12:25:14.2892086Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-abb38b8b64296782.xml (deflated 36%) 2025-12-04T12:25:14.2892748Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-35d5d4bfe910714e.xml (deflated 35%) 2025-12-04T12:25:14.2893411Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-fcdbe5c8d6246957.xml (deflated 44%) 2025-12-04T12:25:14.2894068Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4f2d32d76cd9ea4c.xml (deflated 44%) 2025-12-04T12:25:14.2894714Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8d01dd7848e58726.xml (deflated 43%) 2025-12-04T12:25:14.2895370Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b37ec36150974cdc.xml (deflated 43%) 2025-12-04T12:25:14.2896050Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a5c97ba7476f9699.xml (deflated 43%) 2025-12-04T12:25:14.2896965Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9f7bc9881e047dd1.xml (deflated 43%) 2025-12-04T12:25:14.2897637Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0d8492641a4c3af3.xml (deflated 43%) 2025-12-04T12:25:14.2898305Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a118777d82e8d7e.xml (deflated 36%) 2025-12-04T12:25:14.2898990Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6f1779e409eaf9fb.xml (deflated 45%) 2025-12-04T12:25:14.2899666Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a2c564c0db133fb.xml (deflated 36%) 2025-12-04T12:25:14.2900417Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c4e9ae811cf30c32.xml (deflated 44%) 2025-12-04T12:25:14.2901134Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a0ffda73db67d0e.xml (deflated 44%) 2025-12-04T12:25:14.2901814Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b10091684b37c862.xml (deflated 41%) 2025-12-04T12:25:14.2902617Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-362536b218c78604.xml (deflated 35%) 2025-12-04T12:25:14.2903317Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e2a2b6d5dc912ba1.xml (deflated 35%) 2025-12-04T12:25:14.2903995Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2bfa612f1908806e.xml (deflated 43%) 2025-12-04T12:25:14.2904674Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c241632c1bd2254.xml (deflated 36%) 2025-12-04T12:25:14.2905351Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-300d15ebe169a67d.xml (deflated 56%) 2025-12-04T12:25:14.2906026Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2664154f3bddb6ff.xml (deflated 44%) 2025-12-04T12:25:14.2906699Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b262143f686a88dd.xml (deflated 43%) 2025-12-04T12:25:14.2907378Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c004db07f7b0860b.xml (deflated 43%) 2025-12-04T12:25:14.2908052Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc18c93bde07fa33.xml (deflated 44%) 2025-12-04T12:25:14.2908853Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d33e44b619f43cc1.xml (deflated 57%) 2025-12-04T12:25:14.2909503Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c44272ce3d4ac199.xml (deflated 36%) 2025-12-04T12:25:14.2910168Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ea07358affb5e144.xml (deflated 36%) 2025-12-04T12:25:14.2910814Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c57c7620876639a.xml (deflated 43%) 2025-12-04T12:25:14.2911495Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eede0e2726c06cab.xml (deflated 36%) 2025-12-04T12:25:14.2912192Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a276c210ef7f6689.xml (deflated 43%) 2025-12-04T12:25:14.2912847Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd59825a029f8f8b.xml (deflated 35%) 2025-12-04T12:25:14.2913504Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f5a9742e1242440.xml (deflated 38%) 2025-12-04T12:25:14.2914152Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6b0873e59b83bf9a.xml (deflated 36%) 2025-12-04T12:25:14.2914804Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64bbf1c836e72a15.xml (deflated 35%) 2025-12-04T12:25:14.2915468Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f83300f2b97b0a07.xml (deflated 36%) 2025-12-04T12:25:14.2916179Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-46e1a3ccabb4ea53.xml (deflated 35%) 2025-12-04T12:25:14.2916869Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52cd579e7fe5892c.xml (deflated 44%) 2025-12-04T12:25:14.2917522Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cb876d9d148638c4.xml (deflated 44%) 2025-12-04T12:25:14.2918180Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-419043608d870248.xml (deflated 44%) 2025-12-04T12:25:14.2918831Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-03caaef3ff0396d9.xml (deflated 44%) 2025-12-04T12:25:14.2919481Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a49158b49188737a.xml (deflated 43%) 2025-12-04T12:25:14.2920147Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9371e4128a3ac8fe.xml (deflated 43%) 2025-12-04T12:25:14.2920941Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bf7e7c630fc800f5.xml (deflated 43%) 2025-12-04T12:25:14.2921779Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f263367a9b8ff205.xml (deflated 44%) 2025-12-04T12:25:14.2922452Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9da5cc1abf82fc88.xml (deflated 43%) 2025-12-04T12:25:14.2923128Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-17270d7c5dcce82d.xml (deflated 43%) 2025-12-04T12:25:14.2923819Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52a8a0406f3c10fb.xml (deflated 36%) 2025-12-04T12:25:14.2924497Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8955835fa53fe405.xml (deflated 43%) 2025-12-04T12:25:14.2925179Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41e8000da4470974.xml (deflated 36%) 2025-12-04T12:25:14.2925852Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-17b82ffe3c62718d.xml (deflated 36%) 2025-12-04T12:25:14.2926524Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-550a077945687423.xml (deflated 42%) 2025-12-04T12:25:14.2927262Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-97658b25492d180c.xml (deflated 36%) 2025-12-04T12:25:14.2927940Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5ba6b434230b8a31.xml (deflated 42%) 2025-12-04T12:25:14.2928630Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ab85cfcce385bb9.xml (deflated 36%) 2025-12-04T12:25:14.2929295Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-205c67b3e9ea2006.xml (deflated 36%) 2025-12-04T12:25:14.2929983Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a7727ff60499e455.xml (deflated 36%) 2025-12-04T12:25:14.2930646Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5545774781103441.xml (deflated 35%) 2025-12-04T12:25:14.2931399Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-69b99129eec5d274.xml (deflated 37%) 2025-12-04T12:25:14.2932122Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-71229775f4c708c6.xml (deflated 44%) 2025-12-04T12:25:14.2932796Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ef94932e8a93743e.xml (deflated 43%) 2025-12-04T12:25:14.2933586Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-830e1894dcf5c994.xml (deflated 43%) 2025-12-04T12:25:14.2934337Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d6ec9fe8576de151.xml (deflated 36%) 2025-12-04T12:25:14.2934980Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ac8ca9bd1994ece.xml (deflated 37%) 2025-12-04T12:25:14.2935618Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3403d5bb8935cb4e.xml (deflated 36%) 2025-12-04T12:25:14.2936252Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b0c166deb400ad9d.xml (deflated 36%) 2025-12-04T12:25:14.2937133Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-60e4e17b51df739f.xml (deflated 35%) 2025-12-04T12:25:14.2937802Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22eb7410be2437d9.xml (deflated 35%) 2025-12-04T12:25:14.2938486Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9ee70791b9debd6c.xml (deflated 44%) 2025-12-04T12:25:14.2939167Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-81abecf194df2c45.xml (deflated 44%) 2025-12-04T12:25:14.2939836Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1136154023961765.xml (deflated 43%) 2025-12-04T12:25:14.2940523Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cfef205e8493de16.xml (deflated 36%) 2025-12-04T12:25:14.2941196Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd599f355b8caaeb.xml (deflated 36%) 2025-12-04T12:25:14.2941884Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-62ca7bd8b65dea10.xml (deflated 44%) 2025-12-04T12:25:14.2942559Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b3d3e55cfe315fc5.xml (deflated 36%) 2025-12-04T12:25:14.2943274Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a45eb631d6c35ef.xml (deflated 44%) 2025-12-04T12:25:14.2943962Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-aae6fb78854ea6ff.xml (deflated 36%) 2025-12-04T12:25:14.2944636Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9eef2c9b45729eeb.xml (deflated 47%) 2025-12-04T12:25:14.2945325Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d106ae3bbe7d9e5c.xml (deflated 35%) 2025-12-04T12:25:14.2945996Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ff643138d43dd85.xml (deflated 56%) 2025-12-04T12:25:14.2946681Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c72d0c28afc7b8b.xml (deflated 35%) 2025-12-04T12:25:14.2947406Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8cb6ed13882ace9d.xml (deflated 35%) 2025-12-04T12:25:14.2948104Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-51d5ea88c29b6ed7.xml (deflated 43%) 2025-12-04T12:25:14.2948898Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0e2af92baadfb43c.xml (deflated 36%) 2025-12-04T12:25:14.2949613Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ee64e4888310471.xml (deflated 35%) 2025-12-04T12:25:14.2950225Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2124f6a7f1f8a6ad.xml (deflated 35%) 2025-12-04T12:25:14.2950822Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3a72595ddb271e95.xml (deflated 43%) 2025-12-04T12:25:14.2951426Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f5a0fd7e9efb76d5.xml (deflated 44%) 2025-12-04T12:25:14.2952037Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f05ec777ac110fb6.xml (deflated 36%) 2025-12-04T12:25:14.2952635Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c4dbe227aaf8cd2.xml (deflated 43%) 2025-12-04T12:25:14.2953242Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d8d80edc2b8c69e.xml (deflated 36%) 2025-12-04T12:25:14.2953845Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-50add8f3174dd7ac.xml (deflated 35%) 2025-12-04T12:25:14.2954458Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-851cdc069dcc69f7.xml (deflated 36%) 2025-12-04T12:25:14.2955054Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1acd79e907003b41.xml (deflated 46%) 2025-12-04T12:25:14.2955653Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a0ff1f71f9283f58.xml (deflated 45%) 2025-12-04T12:25:14.2956256Z adding: test/test-reports/dist-nccl-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-65237f33092a4b4f.xml (deflated 36%) 2025-12-04T12:25:14.2956860Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5046dc8bfb623fa3.xml (deflated 35%) 2025-12-04T12:25:14.2957502Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4878dd0838c676b7.xml (deflated 44%) 2025-12-04T12:25:14.2958110Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-66566e960af2b7cd.xml (deflated 35%) 2025-12-04T12:25:14.2958715Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9252bf6025e90d42.xml (deflated 36%) 2025-12-04T12:25:14.2959330Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5b920f5d1c4972a5.xml (deflated 36%) 2025-12-04T12:25:14.2959929Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-41378464ce08003d.xml (deflated 36%) 2025-12-04T12:25:14.2960540Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee4c603fd47011fa.xml (deflated 44%) 2025-12-04T12:25:14.2961201Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9973927e7b530617.xml (deflated 44%) 2025-12-04T12:25:14.2961845Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-faddb0db331380df.xml (deflated 43%) 2025-12-04T12:25:14.2962453Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-babf9f26b0f01a05.xml (deflated 42%) 2025-12-04T12:25:14.2963053Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-682bb4a108ba0cff.xml (deflated 43%) 2025-12-04T12:25:14.2963664Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d0185f9ec4d4c49f.xml (deflated 43%) 2025-12-04T12:25:14.2964262Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-011699f09fdd352f.xml (deflated 43%) 2025-12-04T12:25:14.2964876Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c6b066059948ead.xml (deflated 36%) 2025-12-04T12:25:14.2965483Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22fab5f0e190ff66.xml (deflated 44%) 2025-12-04T12:25:14.2966084Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-55702aa5023cfcc5.xml (deflated 36%) 2025-12-04T12:25:14.2966695Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ccae7814a1c4777f.xml (deflated 44%) 2025-12-04T12:25:14.2967299Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5bd848f11487517d.xml (deflated 44%) 2025-12-04T12:25:14.2967911Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-27d68b49187eba1f.xml (deflated 41%) 2025-12-04T12:25:14.2968519Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cf1bc9411dde71e0.xml (deflated 35%) 2025-12-04T12:25:14.2969136Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-445a5d7115d23df5.xml (deflated 35%) 2025-12-04T12:25:14.2969739Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-44a168cde9f7a829.xml (deflated 43%) 2025-12-04T12:25:14.2970343Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1ba388d3de704172.xml (deflated 35%) 2025-12-04T12:25:14.2970956Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bd986c0befb813c2.xml (deflated 56%) 2025-12-04T12:25:14.2971595Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4610efe5376dfca1.xml (deflated 44%) 2025-12-04T12:25:14.2972206Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8b4358fed50c59f1.xml (deflated 43%) 2025-12-04T12:25:14.2972811Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-526a02721a1ba5da.xml (deflated 43%) 2025-12-04T12:25:14.2973421Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3c0978e54cc6fc10.xml (deflated 44%) 2025-12-04T12:25:14.2974031Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bf5a35496e65d5e4.xml (deflated 57%) 2025-12-04T12:25:14.2974643Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee9c4c3ca48fe737.xml (deflated 36%) 2025-12-04T12:25:14.2975311Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d5ca791415d7ead2.xml (deflated 36%) 2025-12-04T12:25:14.2975937Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4b280a14c5b58c7c.xml (deflated 43%) 2025-12-04T12:25:14.2976607Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9f1e7a55058f0a18.xml (deflated 36%) 2025-12-04T12:25:14.2977444Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c9d23e4c6bbfd6d1.xml (deflated 43%) 2025-12-04T12:25:14.2978128Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d04adc5353a474ef.xml (deflated 35%) 2025-12-04T12:25:14.2978824Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dd5c3fba431f03e3.xml (deflated 37%) 2025-12-04T12:25:14.2979504Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23246ae737e62ded.xml (deflated 36%) 2025-12-04T12:25:14.2980196Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8aa7ae0f58f2813b.xml (deflated 35%) 2025-12-04T12:25:14.2980874Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cd7e251b7cd67b87.xml (deflated 36%) 2025-12-04T12:25:14.2981571Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3ffef4b2a54e0ec6.xml (deflated 35%) 2025-12-04T12:25:14.2982249Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f47719c8fab0f3fd.xml (deflated 44%) 2025-12-04T12:25:14.2982934Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7f97df23e3af62b7.xml (deflated 44%) 2025-12-04T12:25:14.2983624Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7d9b569377c5e6b5.xml (deflated 44%) 2025-12-04T12:25:14.2984304Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e79d7fc843c87404.xml (deflated 44%) 2025-12-04T12:25:14.2984990Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0b4908c887012bf3.xml (deflated 43%) 2025-12-04T12:25:14.2985669Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-15d9380e1c9a62c7.xml (deflated 43%) 2025-12-04T12:25:14.2986390Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-89d48b8548171ec2.xml (deflated 43%) 2025-12-04T12:25:14.2987084Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e87d273ae3e5c7f4.xml (deflated 43%) 2025-12-04T12:25:14.2987768Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5becb9fcc2b2a740.xml (deflated 43%) 2025-12-04T12:25:14.2988467Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e50500c3a0076f9a.xml (deflated 43%) 2025-12-04T12:25:14.2989234Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c28f45efdfac39c4.xml (deflated 36%) 2025-12-04T12:25:14.2989850Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d9fcea5b98362b6a.xml (deflated 43%) 2025-12-04T12:25:14.2990503Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-23763de39322c899.xml (deflated 35%) 2025-12-04T12:25:14.2991130Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f7a5837d4cf564eb.xml (deflated 35%) 2025-12-04T12:25:14.2991749Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f6098aefa2030078.xml (deflated 42%) 2025-12-04T12:25:14.2992349Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d3b389690949ffc.xml (deflated 36%) 2025-12-04T12:25:14.2992961Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-00c0b12dc56300ed.xml (deflated 43%) 2025-12-04T12:25:14.2993563Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-875462dd555a5412.xml (deflated 35%) 2025-12-04T12:25:14.2994165Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5da26e78fc052180.xml (deflated 36%) 2025-12-04T12:25:14.2994783Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-705b7a3606470644.xml (deflated 36%) 2025-12-04T12:25:14.2995380Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3996750239d4977f.xml (deflated 35%) 2025-12-04T12:25:14.2995994Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b1bfbeb9b34c8574.xml (deflated 36%) 2025-12-04T12:25:14.2996600Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6c5cc720d34bebc6.xml (deflated 44%) 2025-12-04T12:25:14.2997218Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b5eb76bc9735e309.xml (deflated 43%) 2025-12-04T12:25:14.2997824Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1a28d2b8c4bb8b97.xml (deflated 43%) 2025-12-04T12:25:14.2998440Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2fa0ff1a8410ed4.xml (deflated 36%) 2025-12-04T12:25:14.2999052Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-42750e8459e7d15b.xml (deflated 37%) 2025-12-04T12:25:14.2999654Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d44ddde7846d301e.xml (deflated 36%) 2025-12-04T12:25:14.3000293Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d84034c24f131de9.xml (deflated 36%) 2025-12-04T12:25:14.3000900Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b21382e4a0d075d7.xml (deflated 36%) 2025-12-04T12:25:14.3001502Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f01856e9a2028bff.xml (deflated 35%) 2025-12-04T12:25:14.3002115Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d271f82508cdd35e.xml (deflated 44%) 2025-12-04T12:25:14.3002717Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-602ab3c67d585e00.xml (deflated 44%) 2025-12-04T12:25:14.3003333Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6c4b4f500cbe46b2.xml (deflated 43%) 2025-12-04T12:25:14.3003941Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-060bfe393d18a7b7.xml (deflated 36%) 2025-12-04T12:25:14.3004635Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-08a6cb454dfb3288.xml (deflated 36%) 2025-12-04T12:25:14.3005262Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-14f8591ab0b18d47.xml (deflated 44%) 2025-12-04T12:25:14.3005869Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-faf65bc8adad7023.xml (deflated 36%) 2025-12-04T12:25:14.3006488Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7ab921a38daba1bb.xml (deflated 45%) 2025-12-04T12:25:14.3007094Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-205a17c445d16b08.xml (deflated 36%) 2025-12-04T12:25:14.3007710Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-14314f5e6064defd.xml (deflated 47%) 2025-12-04T12:25:14.3008311Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9a98077fc0a28449.xml (deflated 36%) 2025-12-04T12:25:14.3008923Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3e2de3e4d8afa5ff.xml (deflated 56%) 2025-12-04T12:25:14.3009539Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-512586046bd1af6f.xml (deflated 36%) 2025-12-04T12:25:14.3010143Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1fa69b7512f74eae.xml (deflated 36%) 2025-12-04T12:25:14.3010753Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-70138f82b180a3f5.xml (deflated 43%) 2025-12-04T12:25:14.3011357Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b7ed61d0627f9533.xml (deflated 36%) 2025-12-04T12:25:14.3011971Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-493e10e45797f8fa.xml (deflated 36%) 2025-12-04T12:25:14.3012572Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-87c65811f60e5e0f.xml (deflated 35%) 2025-12-04T12:25:14.3013177Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-635f35dfbbc33c85.xml (deflated 43%) 2025-12-04T12:25:14.3013791Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-355930f4da4ab18f.xml (deflated 45%) 2025-12-04T12:25:14.3014427Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f6333fa7d0fe5c91.xml (deflated 37%) 2025-12-04T12:25:14.3015050Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3076e5b00c0eef07.xml (deflated 43%) 2025-12-04T12:25:14.3015647Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9141798051401a79.xml (deflated 36%) 2025-12-04T12:25:14.3016246Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d96c5808f2f4d423.xml (deflated 35%) 2025-12-04T12:25:14.3017114Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-59eca95b80bf15e4.xml (deflated 36%) 2025-12-04T12:25:14.3017799Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7eeb7f329dcb1625.xml (deflated 46%) 2025-12-04T12:25:14.3018564Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c438893677b09839.xml (deflated 45%) 2025-12-04T12:25:14.3019272Z adding: test/test-reports/dist-nccl-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d707ddf229008c6a.xml (deflated 36%) 2025-12-04T12:25:14.3019959Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c31ce4d4db4e93a.xml (deflated 35%) 2025-12-04T12:25:14.3020631Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-714862760bd05954.xml (deflated 37%) 2025-12-04T12:25:14.3021487Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-16429bc307938d70.xml (deflated 35%) 2025-12-04T12:25:14.3022183Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-92f77f3d8cd66053.xml (deflated 36%) 2025-12-04T12:25:14.3022859Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-deed4e34c84ee498.xml (deflated 45%) 2025-12-04T12:25:14.3023537Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-425b9693fd331423.xml (deflated 35%) 2025-12-04T12:25:14.3024206Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9149f9baa8d84141.xml (deflated 43%) 2025-12-04T12:25:14.3024885Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78d5cc488c73d225.xml (deflated 43%) 2025-12-04T12:25:14.3025566Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-017a63f22f7a2e26.xml (deflated 36%) 2025-12-04T12:25:14.3026247Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3e6391f21f8fa7c0.xml (deflated 35%) 2025-12-04T12:25:14.3026927Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9e8b675076ef3915.xml (deflated 36%) 2025-12-04T12:25:14.3027600Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b8d64d4666fb6c9d.xml (deflated 36%) 2025-12-04T12:25:14.3028280Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0dee982caae0bf52.xml (deflated 35%) 2025-12-04T12:25:14.3028954Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0df7122c519ced4f.xml (deflated 36%) 2025-12-04T12:25:14.3029693Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2827e400085e914f.xml (deflated 45%) 2025-12-04T12:25:14.3030365Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7d39e0b557433741.xml (deflated 44%) 2025-12-04T12:25:14.3031040Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e6c5067f69c5dc42.xml (deflated 44%) 2025-12-04T12:25:14.3031707Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d40c5c296523fcf4.xml (deflated 44%) 2025-12-04T12:25:14.3032372Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e19c088745912810.xml (deflated 35%) 2025-12-04T12:25:14.3033253Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-21b633b88362af20.xml (deflated 35%) 2025-12-04T12:25:14.3033917Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f1d69885e8023d73.xml (deflated 35%) 2025-12-04T12:25:14.3034550Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76455ff9fe96f12c.xml (deflated 35%) 2025-12-04T12:25:14.3035150Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9224f6b7ff8b973c.xml (deflated 36%) 2025-12-04T12:25:14.3035743Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64019cd840b5ae37.xml (deflated 43%) 2025-12-04T12:25:14.3036346Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c52c688cda6423d1.xml (deflated 44%) 2025-12-04T12:25:14.3036944Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-56aae62a7e88ec0a.xml (deflated 35%) 2025-12-04T12:25:14.3037552Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-126517b1e280f193.xml (deflated 36%) 2025-12-04T12:25:14.3038145Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2d346d213506e58a.xml (deflated 36%) 2025-12-04T12:25:14.3038751Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-093f4d1e23acb10f.xml (deflated 57%) 2025-12-04T12:25:14.3039343Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-810e1605bd5350e8.xml (deflated 36%) 2025-12-04T12:25:14.3039937Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-43db9cfa18063736.xml (deflated 36%) 2025-12-04T12:25:14.3040543Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3d256d1cc46d8d8d.xml (deflated 36%) 2025-12-04T12:25:14.3041147Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a0174602e3f0dc49.xml (deflated 42%) 2025-12-04T12:25:14.3041748Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9d15167d0a9773e6.xml (deflated 35%) 2025-12-04T12:25:14.3042340Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2a355bd7e8aa2084.xml (deflated 35%) 2025-12-04T12:25:14.3042935Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a694586bb28814d4.xml (deflated 37%) 2025-12-04T12:25:14.3043540Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-91f11f0cc30a0889.xml (deflated 35%) 2025-12-04T12:25:14.3044161Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc882534d0c7ac9e.xml (deflated 35%) 2025-12-04T12:25:14.3044765Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3576431fa0a79154.xml (deflated 36%) 2025-12-04T12:25:14.3045361Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-85e1893ad67dccf3.xml (deflated 35%) 2025-12-04T12:25:14.3045965Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-148510b891c749c6.xml (deflated 35%) 2025-12-04T12:25:14.3046561Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e6549972a7efaf11.xml (deflated 35%) 2025-12-04T12:25:14.3047160Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ea6ea860d10e295.xml (deflated 36%) 2025-12-04T12:25:14.3048012Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-83ab4f7124e50996.xml (deflated 36%) 2025-12-04T12:25:14.3048742Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a6c1a924e8712f89.xml (deflated 43%) 2025-12-04T12:25:14.3049570Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0bec6d0d6dd273b2.xml (deflated 36%) 2025-12-04T12:25:14.3050262Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ce5c2131a079a118.xml (deflated 36%) 2025-12-04T12:25:14.3050933Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-9aa0d7a04a1b05f2.xml (deflated 44%) 2025-12-04T12:25:14.3051608Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-85e0e890e418ce3a.xml (deflated 44%) 2025-12-04T12:25:14.3052281Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4cffe073269e4f0a.xml (deflated 43%) 2025-12-04T12:25:14.3052939Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-fb78beccd38dd26e.xml (deflated 42%) 2025-12-04T12:25:14.3053590Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c24763a200436369.xml (deflated 36%) 2025-12-04T12:25:14.3054244Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-95f84fd6ea33eee0.xml (deflated 47%) 2025-12-04T12:25:14.3054893Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-88fe6d3cec93de32.xml (deflated 35%) 2025-12-04T12:25:14.3055548Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0260bf01f397061e.xml (deflated 35%) 2025-12-04T12:25:14.3056205Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc07ca8676eed412.xml (deflated 36%) 2025-12-04T12:25:14.3057115Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c73c9ddbbd799146.xml (deflated 43%) 2025-12-04T12:25:14.3057783Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d73e4a124891508d.xml (deflated 35%) 2025-12-04T12:25:14.3058457Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e44eef95a4d81dc3.xml (deflated 36%) 2025-12-04T12:25:14.3059176Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-78d0f5373874b1c4.xml (deflated 36%) 2025-12-04T12:25:14.3059850Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4c88483e90b04648.xml (deflated 35%) 2025-12-04T12:25:14.3060540Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ccf199cbc8b611ab.xml (deflated 37%) 2025-12-04T12:25:14.3061217Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6a4daccc9da30cdb.xml (deflated 36%) 2025-12-04T12:25:14.3061899Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d983aecef8c58dfb.xml (deflated 36%) 2025-12-04T12:25:14.3062571Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-746325984b31e17e.xml (deflated 43%) 2025-12-04T12:25:14.3063303Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0b8591cc84ef2a6a.xml (deflated 43%) 2025-12-04T12:25:14.3064011Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c4d97d092b2123a2.xml (deflated 37%) 2025-12-04T12:25:14.3064678Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1574030634816010.xml (deflated 36%) 2025-12-04T12:25:14.3065360Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5fa3a6eb60f4eca4.xml (deflated 36%) 2025-12-04T12:25:14.3066034Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e754e92f5037c52.xml (deflated 35%) 2025-12-04T12:25:14.3066702Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-020049def8c5b0a9.xml (deflated 43%) 2025-12-04T12:25:14.3067388Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d4dd04eda8983093.xml (deflated 36%) 2025-12-04T12:25:14.3068057Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5a612b5b9d29cdf4.xml (deflated 36%) 2025-12-04T12:25:14.3068937Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f0f750f594e5734b.xml (deflated 43%) 2025-12-04T12:25:14.3069534Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7cb1e30e8a2e57ea.xml (deflated 43%) 2025-12-04T12:25:14.3070127Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-bc8052641a24d5dc.xml (deflated 44%) 2025-12-04T12:25:14.3070735Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d8cbbb1187ec0f64.xml (deflated 36%) 2025-12-04T12:25:14.3071329Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f83af7e95786df72.xml (deflated 35%) 2025-12-04T12:25:14.3071931Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a731f1e0a2629b95.xml (deflated 44%) 2025-12-04T12:25:14.3072527Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3ae47b09c2c50f23.xml (deflated 42%) 2025-12-04T12:25:14.3073129Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ec880e83b34c8e36.xml (deflated 47%) 2025-12-04T12:25:14.3073724Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c3833fdae73dbf3c.xml (deflated 47%) 2025-12-04T12:25:14.3074359Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-86aa7d82374c9e5b.xml (deflated 56%) 2025-12-04T12:25:14.3074968Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a10e426b5fcbde30.xml (deflated 35%) 2025-12-04T12:25:14.3075572Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ff35c7e5488dd9ac.xml (deflated 35%) 2025-12-04T12:25:14.3076176Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-924d345c27601ea8.xml (deflated 44%) 2025-12-04T12:25:14.3076776Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1681683ab3d327ac.xml (deflated 36%) 2025-12-04T12:25:14.3077373Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-22e9fd6e5aba0f0d.xml (deflated 36%) 2025-12-04T12:25:14.3078032Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d9dffcfba1bc1e60.xml (deflated 35%) 2025-12-04T12:25:14.3078663Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1b652ce23cebda63.xml (deflated 36%) 2025-12-04T12:25:14.3079269Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b5b9a6fa991ecf1c.xml (deflated 44%) 2025-12-04T12:25:14.3079868Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-1f3a9e9304d25446.xml (deflated 45%) 2025-12-04T12:25:14.3080468Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0390eeced956f562.xml (deflated 36%) 2025-12-04T12:25:14.3081058Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-439532956daa54d1.xml (deflated 43%) 2025-12-04T12:25:14.3081660Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0f977aa3cd3cecaf.xml (deflated 41%) 2025-12-04T12:25:14.3082260Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-24127363c11860de.xml (deflated 41%) 2025-12-04T12:25:14.3082853Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0cd422e8a222e606.xml (deflated 36%) 2025-12-04T12:25:14.3083454Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-27b9de38969ee6f6.xml (deflated 35%) 2025-12-04T12:25:14.3084053Z adding: test/test-reports/dist-gloo-init-env/distributed.test_distributed_spawn/distributed.test_distributed_spawn-62abfea4d6932c1e.xml (deflated 36%) 2025-12-04T12:25:14.3084663Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d86e179dbef96adf.xml (deflated 35%) 2025-12-04T12:25:14.3085279Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a6abc3b994eecaab.xml (deflated 37%) 2025-12-04T12:25:14.3085880Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f8fe4b288348a5e8.xml (deflated 35%) 2025-12-04T12:25:14.3086486Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e1865fe4cd352327.xml (deflated 36%) 2025-12-04T12:25:14.3087085Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2d135dba3284d9dd.xml (deflated 45%) 2025-12-04T12:25:14.3087720Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8ce519dd6997621a.xml (deflated 35%) 2025-12-04T12:25:14.3088320Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d25b88aa16186c5.xml (deflated 43%) 2025-12-04T12:25:14.3088921Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2b545a8cfb56682b.xml (deflated 43%) 2025-12-04T12:25:14.3089529Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-96320154d0a3f580.xml (deflated 36%) 2025-12-04T12:25:14.3090131Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d58d0eb09203fc2c.xml (deflated 35%) 2025-12-04T12:25:14.3090739Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-76e7132ba7ac5de0.xml (deflated 36%) 2025-12-04T12:25:14.3091387Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a537f0ef8ed460d9.xml (deflated 36%) 2025-12-04T12:25:14.3092010Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3c40fad651035635.xml (deflated 35%) 2025-12-04T12:25:14.3092618Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-68c5b031d9a5ae9e.xml (deflated 36%) 2025-12-04T12:25:14.3093215Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-712b0b28be8414a0.xml (deflated 44%) 2025-12-04T12:25:14.3093820Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7eca96992921c511.xml (deflated 44%) 2025-12-04T12:25:14.3094417Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7834531011d91518.xml (deflated 44%) 2025-12-04T12:25:14.3095029Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-68f03a926c8d2bd9.xml (deflated 44%) 2025-12-04T12:25:14.3095634Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e49faae68d1ac0d9.xml (deflated 36%) 2025-12-04T12:25:14.3096232Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc4d026c52898da8.xml (deflated 35%) 2025-12-04T12:25:14.3097098Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-03eaa4726076d233.xml (deflated 35%) 2025-12-04T12:25:14.3097774Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d471afa2e27428d.xml (deflated 35%) 2025-12-04T12:25:14.3098460Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-065a466bb3b41d27.xml (deflated 36%) 2025-12-04T12:25:14.3099145Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f328e482896672aa.xml (deflated 43%) 2025-12-04T12:25:14.3099827Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee7ee7e277bba08f.xml (deflated 44%) 2025-12-04T12:25:14.3100540Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e55ae93852ba5a41.xml (deflated 36%) 2025-12-04T12:25:14.3101224Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6750ff7d9a08403d.xml (deflated 36%) 2025-12-04T12:25:14.3101901Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d85fe03caf11b880.xml (deflated 35%) 2025-12-04T12:25:14.3102624Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f90e1eb29ec7a7eb.xml (deflated 57%) 2025-12-04T12:25:14.3103318Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5c515ad73db9ec0f.xml (deflated 36%) 2025-12-04T12:25:14.3103997Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-be5d3342961d1397.xml (deflated 36%) 2025-12-04T12:25:14.3104682Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-81a8ca35b73b2608.xml (deflated 36%) 2025-12-04T12:25:14.3105363Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6eb3b25e1011068f.xml (deflated 41%) 2025-12-04T12:25:14.3106055Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-16ab3c0f531a2710.xml (deflated 35%) 2025-12-04T12:25:14.3106786Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4e912af285a88a53.xml (deflated 35%) 2025-12-04T12:25:14.3107493Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-043dda7312ce02a9.xml (deflated 37%) 2025-12-04T12:25:14.3108182Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3cf2335721c75edb.xml (deflated 36%) 2025-12-04T12:25:14.3108857Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ed68ee99b507df29.xml (deflated 35%) 2025-12-04T12:25:14.3109639Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-afe3aa9ea643db5b.xml (deflated 36%) 2025-12-04T12:25:14.3110283Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-706ef1f553cb8cca.xml (deflated 35%) 2025-12-04T12:25:14.3110923Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-a98124b8f8d7b3ef.xml (deflated 35%) 2025-12-04T12:25:14.3111566Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ee37bb64a8e84ec5.xml (deflated 36%) 2025-12-04T12:25:14.3112208Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e2af230e2fec6d35.xml (deflated 36%) 2025-12-04T12:25:14.3112854Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3008545966a2ad5b.xml (deflated 36%) 2025-12-04T12:25:14.3113489Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-53870facd803211b.xml (deflated 43%) 2025-12-04T12:25:14.3114146Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-4eca7697caf90c2a.xml (deflated 36%) 2025-12-04T12:25:14.3114784Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c4554d604268fb5.xml (deflated 36%) 2025-12-04T12:25:14.3115419Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c6b52be0b4531e90.xml (deflated 43%) 2025-12-04T12:25:14.3116059Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c63a3f0987273dba.xml (deflated 44%) 2025-12-04T12:25:14.3116695Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b58af3771e34dd96.xml (deflated 43%) 2025-12-04T12:25:14.3117367Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-587b09149e6cc83f.xml (deflated 42%) 2025-12-04T12:25:14.3118005Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e3786dc33e6abd50.xml (deflated 36%) 2025-12-04T12:25:14.3118646Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-dfce7e92d72e48a2.xml (deflated 47%) 2025-12-04T12:25:14.3119295Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-627617d506ff1d2f.xml (deflated 36%) 2025-12-04T12:25:14.3119929Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-64530dfd24199eb7.xml (deflated 36%) 2025-12-04T12:25:14.3120574Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-0ddc33c5ddc10dde.xml (deflated 36%) 2025-12-04T12:25:14.3121696Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d0632db0896072cf.xml (deflated 43%) 2025-12-04T12:25:14.3122447Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-edeb0bbc0394ec67.xml (deflated 35%) 2025-12-04T12:25:14.3123138Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e515d47fe2e6fb9c.xml (deflated 36%) 2025-12-04T12:25:14.3123828Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7c4f0278f004bb5c.xml (deflated 36%) 2025-12-04T12:25:14.3124508Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c0d3bae257da8444.xml (deflated 35%) 2025-12-04T12:25:14.3125189Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7025af433f00efbb.xml (deflated 37%) 2025-12-04T12:25:14.3125885Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-49fd198402d5c655.xml (deflated 36%) 2025-12-04T12:25:14.3126567Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5277c0b0a803851c.xml (deflated 36%) 2025-12-04T12:25:14.3127259Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-3d4c61b2ce73c677.xml (deflated 43%) 2025-12-04T12:25:14.3127938Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cb0710cc3c031aa2.xml (deflated 43%) 2025-12-04T12:25:14.3128623Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-e4cf4d2497acecc4.xml (deflated 38%) 2025-12-04T12:25:14.3129317Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-b0b71a9d976366a8.xml (deflated 36%) 2025-12-04T12:25:14.3129998Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8c2b944477a517c5.xml (deflated 36%) 2025-12-04T12:25:14.3130685Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2c7a620380978373.xml (deflated 35%) 2025-12-04T12:25:14.3131371Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8aaa461eddd2a0f5.xml (deflated 43%) 2025-12-04T12:25:14.3132063Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-d5c5af8107d86770.xml (deflated 36%) 2025-12-04T12:25:14.3132788Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-629d0d3ddf4c3e06.xml (deflated 36%) 2025-12-04T12:25:14.3133565Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-7350065f0535f01a.xml (deflated 43%) 2025-12-04T12:25:14.3134290Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-877f842d3f2815af.xml (deflated 43%) 2025-12-04T12:25:14.3134894Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c391387e4c62daf7.xml (deflated 44%) 2025-12-04T12:25:14.3135507Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cea6ac435fa81670.xml (deflated 36%) 2025-12-04T12:25:14.3136111Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-69f0ceb782ba322d.xml (deflated 36%) 2025-12-04T12:25:14.3136949Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-354a8796ee4ffd32.xml (deflated 43%) 2025-12-04T12:25:14.3137739Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-52a60b9c4e3ec8c5.xml (deflated 42%) 2025-12-04T12:25:14.3138421Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-576d152cd04ca1c5.xml (deflated 47%) 2025-12-04T12:25:14.3139100Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5733f17598591d18.xml (deflated 47%) 2025-12-04T12:25:14.3139779Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8d06b92a9ae7d27c.xml (deflated 56%) 2025-12-04T12:25:14.3140468Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ebef8e69977ebea2.xml (deflated 36%) 2025-12-04T12:25:14.3141150Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ea6c158c65373811.xml (deflated 35%) 2025-12-04T12:25:14.3141832Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-f2ff679811871b4a.xml (deflated 44%) 2025-12-04T12:25:14.3142525Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-cc9e37194800f0d1.xml (deflated 36%) 2025-12-04T12:25:14.3143198Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-5145615a66bd578b.xml (deflated 36%) 2025-12-04T12:25:14.3143889Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-33b7f705a30ded9f.xml (deflated 36%) 2025-12-04T12:25:14.3144567Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca496a8780de69f3.xml (deflated 36%) 2025-12-04T12:25:14.3145253Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-8bec3baffba656ff.xml (deflated 44%) 2025-12-04T12:25:14.3145947Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-c836ef383c971ad8.xml (deflated 45%) 2025-12-04T12:25:14.3146625Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-deb32df1c36c795c.xml (deflated 36%) 2025-12-04T12:25:14.3147316Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6dabff71918e7b99.xml (deflated 42%) 2025-12-04T12:25:14.3147995Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ca39e437f793eab2.xml (deflated 41%) 2025-12-04T12:25:14.3148715Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-6d93f79d5e733c01.xml (deflated 42%) 2025-12-04T12:25:14.3149453Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-2079ea64f821f40e.xml (deflated 36%) 2025-12-04T12:25:14.3150052Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-eb15a6e33c260556.xml (deflated 35%) 2025-12-04T12:25:14.3150667Z adding: test/test-reports/dist-gloo-init-file/distributed.test_distributed_spawn/distributed.test_distributed_spawn-ae1eb5639088ccd8.xml (deflated 36%) 2025-12-04T12:25:14.3169914Z ##[group]Run # Remove any previous usage logs if they exist 2025-12-04T12:25:14.3170107Z # Remove any previous usage logs if they exist 2025-12-04T12:25:14.3170220Z rm -f logs-*.zip 2025-12-04T12:25:14.3170405Z zip "logs-${FILE_SUFFIX}.zip" 'usage_log.txt' || true 2025-12-04T12:25:14.3170651Z zip -r "logs-${FILE_SUFFIX}.zip" test/test-reports -i '*.log' || true 2025-12-04T12:25:14.3176297Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:25:14.3176485Z env: 2025-12-04T12:25:14.3176602Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:14.3176695Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:14.3177045Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:14.3177379Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:14.3177696Z FILE_SUFFIX: test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904 2025-12-04T12:25:14.3177806Z ##[endgroup] 2025-12-04T12:25:14.3230733Z adding: usage_log.txt (deflated 58%) 2025-12-04T12:25:14.3321020Z adding: test/test-reports/distributed.test_c10d_functional_native_1.1_5ceb4f282067967e_.log (deflated 85%) 2025-12-04T12:25:14.3324561Z adding: test/test-reports/distributed.fsdp.test_fsdp_overlap_1.1_6a5a97322901a03e_.log (deflated 90%) 2025-12-04T12:25:14.3336197Z adding: test/test-reports/distributed.fsdp.test_fsdp_pure_fp16_1.1_2de43ef0fea2c555_.log (deflated 96%) 2025-12-04T12:25:14.3337910Z adding: test/test-reports/distributed.tensor.debug.test_debug_mode_1.1_8a4ec9b51bad1d98_.log (deflated 81%) 2025-12-04T12:25:14.3378949Z adding: test/test-reports/distributed.fsdp.test_fsdp_exec_order_1.1_a2a67ccbd845e856_.log (deflated 97%) 2025-12-04T12:25:14.3425166Z adding: test/test-reports/distributed.fsdp.test_hsdp_dtensor_state_dict_1.1_8591eb8b13b136e6_.log (deflated 97%) 2025-12-04T12:25:14.3453609Z adding: test/test-reports/distributed.fsdp.test_fsdp_clip_grad_norm_1.1_4959fae61140b3a8_.log (deflated 96%) 2025-12-04T12:25:14.3631848Z adding: test/test-reports/distributed.fsdp.test_fsdp_core_2.2_6137898c6891d430_.log (deflated 96%) 2025-12-04T12:25:14.3632793Z adding: test/test-reports/distributed.algorithms.test_join_1.1_8f0ad2e1263a10f0_.log (deflated 84%) 2025-12-04T12:25:14.3635474Z adding: test/test-reports/distributed.pipelining.test_schedule_multiproc_1.1_3173a38c7a75b752_.log (deflated 90%) 2025-12-04T12:25:14.3637255Z adding: test/test-reports/distributed.test_compute_comm_reordering_1.1_7c582fe21d8b6d0b_.log (deflated 86%) 2025-12-04T12:25:14.3637681Z adding: test/test-reports/distributed.test_cupy_as_tensor_1.1_01ccc395c80cccfc_.log (deflated 53%) 2025-12-04T12:25:14.3638235Z adding: test/test-reports/distributed.fsdp.test_fsdp_fx_1.1_5233411b5b9ade93_.log (deflated 53%) 2025-12-04T12:25:14.3638928Z adding: test/test-reports/distributed._tools.test_sac_ilp_1.1_aac1d3e83d5577ad_.log (deflated 61%) 2025-12-04T12:25:14.3639645Z adding: test/test-reports/distributed.checkpoint.test_hf_storage_1.1_ec1da04f72df0c46_.log (deflated 67%) 2025-12-04T12:25:14.3640484Z adding: test/test-reports/distributed.pipelining.test_microbatch_1.1_e0b58af1802f4b06_.log (deflated 68%) 2025-12-04T12:25:14.3641104Z adding: test/test-reports/distributed.tensor.test_placement_types_1.1_c7b4602e70c3b07a_.log (deflated 68%) 2025-12-04T12:25:14.3641806Z adding: test/test-reports/distributed.tensor.test_dtensor_dispatch_overhead_1.1_85c49e7d8275b78b_.log (deflated 63%) 2025-12-04T12:25:14.3642190Z adding: test/test-reports/distributed.rpc.test_faulty_agent_1.1_9f30efe05bf109e0_.log (stored 0%) 2025-12-04T12:25:14.3642800Z adding: test/test-reports/distributed.checkpoint._experimental.test_checkpoint_reader_1.1_68c37a9fa1601552_.log (deflated 74%) 2025-12-04T12:25:14.3643945Z adding: test/test-reports/distributed.checkpoint.test_format_utils_1.1_04ae55b8cdf477fd_.log (deflated 80%) 2025-12-04T12:25:14.3648868Z adding: test/test-reports/distributed.test_aten_comm_compute_reordering_1.2_69f8c7d62333ccaf_.log (deflated 93%) 2025-12-04T12:25:14.3653091Z adding: test/test-reports/distributed.tensor.test_redistribute_2.2_51e2d05d075503bf_.log (deflated 91%) 2025-12-04T12:25:14.3655274Z adding: test/test-reports/distributed.tensor.parallel.test_tp_style_1.1_54e71dcd4ed048eb_.log (deflated 86%) 2025-12-04T12:25:14.3656679Z adding: test/test-reports/distributed.tensor.test_api_1.1_f4574b86db79cb55_.log (deflated 85%) 2025-12-04T12:25:14.3659155Z adding: test/test-reports/distributed.checkpoint.test_fsspec_1.1_8eaa241efddb416a_.log (deflated 86%) 2025-12-04T12:25:14.3659980Z adding: test/test-reports/distributed.tensor.experimental.test_tp_transform_1.1_d11081dcea691eaf_.log (deflated 84%) 2025-12-04T12:25:14.3660662Z adding: test/test-reports/distributed.checkpoint.test_traverse_1.1_eea2c84c34471245_.log (deflated 71%) 2025-12-04T12:25:14.3663434Z adding: test/test-reports/distributed.tensor.test_random_ops_1.1_b2ded413b82ba64f_.log (deflated 88%) 2025-12-04T12:25:14.3665334Z adding: test/test-reports/distributed._shard.sharded_tensor.ops.test_embedding_1.1_94d647ccb113bbd0_.log (deflated 91%) 2025-12-04T12:25:14.3665840Z adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_logging_1.1_334cd8181d21220c_.log (deflated 53%) 2025-12-04T12:25:14.3666362Z adding: test/test-reports/distributed.launcher.test_api_1.1_4a83e51b1f3b8245_.log (deflated 58%) 2025-12-04T12:25:14.3667175Z adding: test/test-reports/distributed.elastic.multiprocessing.test_api_1.1_4bf04d2a67164589_.log (deflated 72%) 2025-12-04T12:25:14.3667942Z adding: test/test-reports/distributed.fsdp.test_shard_utils_1.1_4e12f3568c69a797_.log (deflated 67%) 2025-12-04T12:25:14.3672323Z adding: test/test-reports/distributed.checkpoint.test_fsdp_optim_state_1.1_d25d2159eaa83e63_.log (deflated 96%) 2025-12-04T12:25:14.3680237Z adding: test/test-reports/distributed.checkpoint.e2e.test_e2e_save_and_load_1.1_4cbd59f9e8ee7ec0_.log (deflated 93%) 2025-12-04T12:25:14.3682505Z adding: test/test-reports/distributed.checkpoint.test_dtensor_resharding_1.1_a0990bee4dfbe749_.log (deflated 91%) 2025-12-04T12:25:14.3683269Z adding: test/test-reports/distributed.fsdp.test_fsdp_memory_1.1_ac8e61e17ebeaaa5_.log (deflated 75%) 2025-12-04T12:25:14.3684541Z adding: test/test-reports/distributed.tensor.test_pointwise_ops_1.1_fc7ea695ae4d24dd_.log (deflated 77%) 2025-12-04T12:25:14.3685085Z adding: test/test-reports/distributed.checkpoint.test_compatibility_1.1_995845a47bb8bc7e_.log (deflated 65%) 2025-12-04T12:25:14.3685673Z adding: test/test-reports/distributed._tools.test_mem_tracker_1.1_c5962f3ebcf85955_.log (deflated 61%) 2025-12-04T12:25:14.3686612Z adding: test/test-reports/distributed.elastic.test_control_plane_1.1_74d942263f51456c_.log (deflated 77%) 2025-12-04T12:25:14.3687350Z adding: test/test-reports/distributed.test_fake_pg_1.1_ecf9a296b2457f78_.log (deflated 75%) 2025-12-04T12:25:14.3690334Z adding: test/test-reports/distributed.checkpoint.test_fsdp_model_state_1.1_0d5362771b48c12a_.log (deflated 94%) 2025-12-04T12:25:14.3691882Z adding: test/test-reports/distributed.test_functional_api_1.1_d60bb00edf6e8a81_.log (deflated 84%) 2025-12-04T12:25:14.3692681Z adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_clip_grad_norm__1.1_76ba1390d272d622_.log (deflated 69%) 2025-12-04T12:25:14.3693350Z adding: test/test-reports/distributed.tensor.debug.test_comm_mode_1.1_40ca723c6c817b86_.log (deflated 62%) 2025-12-04T12:25:14.3696451Z adding: test/test-reports/distributed.test_dist2_1.1_cc2e2f70acaf1086_.log (deflated 88%) 2025-12-04T12:25:14.3697492Z adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_grad_scaler_1.1_5aa2313403ba4568_.log (deflated 61%) 2025-12-04T12:25:14.3700595Z adding: test/test-reports/distributed.launcher.test_run_1.1_b22d13de769d84ff_.log (deflated 89%) 2025-12-04T12:25:14.3701575Z adding: test/test-reports/distributed.fsdp.test_fsdp_backward_prefetch_1.1_29df4062c54c1e1a_.log (deflated 66%) 2025-12-04T12:25:14.3704027Z adding: test/test-reports/distributed.checkpoint.test_checkpoint_1.1_d7eb3fb6652ade87_.log (deflated 91%) 2025-12-04T12:25:14.3704563Z adding: test/test-reports/distributed._pycute.test_coalesce_1.1_b9854b582e22535e_.log (deflated 53%) 2025-12-04T12:25:14.3705062Z adding: test/test-reports/distributed._pycute.test_complement_1.1_ccd05958479ced51_.log (deflated 54%) 2025-12-04T12:25:14.3705754Z adding: test/test-reports/distributed._pycute.test_composition_1.1_6a9f660c56ddbb95_.log (deflated 54%) 2025-12-04T12:25:14.3706444Z adding: test/test-reports/distributed._pycute.test_int_tuple_1.1_1b6829b59a3a12af_.log (deflated 75%) 2025-12-04T12:25:14.3706978Z adding: test/test-reports/distributed._pycute.test_left_inverse_1.1_e810fe2e4745b377_.log (deflated 54%) 2025-12-04T12:25:14.3707644Z adding: test/test-reports/distributed._pycute.test_right_inverse_1.1_c9aa035dc9548e77_.log (deflated 54%) 2025-12-04T12:25:14.3709455Z adding: test/test-reports/distributed._composable.test_replicate_1.1_ede2d02b7e8a4250_.log (deflated 89%) 2025-12-04T12:25:14.3712487Z adding: test/test-reports/distributed.checkpoint.test_hsdp_checkpoint_1.1_38b6379e9fe79671_.log (deflated 94%) 2025-12-04T12:25:14.3715061Z adding: test/test-reports/distributed.tensor.parallel.test_parallelize_api_1.1_a79c3b02a80366e9_.log (deflated 88%) 2025-12-04T12:25:14.3738711Z adding: test/test-reports/distributed.fsdp.test_fsdp_state_dict_1.2_f864b6fe160d675b_.log (deflated 97%) 2025-12-04T12:25:14.3739180Z adding: test/test-reports/distributed._pycute.test_typing_1.1_70d9a252095d6a68_.log (deflated 53%) 2025-12-04T12:25:14.3739598Z adding: test/test-reports/distributed.test_distributed_spawn_1.9_8732ec05eb19aa05_.log (deflated 12%) 2025-12-04T12:25:14.3740024Z adding: test/test-reports/distributed.test_distributed_spawn_1.9_28ca104a37c9a833_.log (deflated 12%) 2025-12-04T12:25:14.3740448Z adding: test/test-reports/distributed.test_distributed_spawn_1.9_4a0940f8014b8eef_.log (deflated 83%) 2025-12-04T12:25:14.3741143Z adding: test/test-reports/distributed.test_distributed_spawn_1.9_dc17769dd5c2239f_.log (deflated 83%) 2025-12-04T12:25:14.3748478Z adding: test/test-reports/distributed.test_distributed_spawn_1.9_3cbdf0379e4c6767_.log (deflated 93%) 2025-12-04T12:25:14.3755513Z adding: test/test-reports/distributed.test_distributed_spawn_1.9_25c7f8918b3d0b51_.log (deflated 93%) 2025-12-04T12:25:14.3763570Z adding: test/test-reports/distributed.test_distributed_spawn_1.9_6f55519eb0301937_.log (deflated 94%) 2025-12-04T12:25:14.3771621Z adding: test/test-reports/distributed.test_distributed_spawn_1.9_c42c9aaca0d3f434_.log (deflated 94%) 2025-12-04T12:25:14.3772202Z adding: test/test-reports/distributed.test_distributed_spawn_4.9_cfb55a01555794b3_.log (deflated 12%) 2025-12-04T12:25:14.3772599Z adding: test/test-reports/distributed.test_distributed_spawn_4.9_5d1f467e5bbdaff2_.log (deflated 12%) 2025-12-04T12:25:14.3773009Z adding: test/test-reports/distributed.test_distributed_spawn_4.9_b5a10ee12046d5b9_.log (deflated 82%) 2025-12-04T12:25:14.3773582Z adding: test/test-reports/distributed.test_distributed_spawn_4.9_de48cc4d8d8e3c13_.log (deflated 82%) 2025-12-04T12:25:14.3781231Z adding: test/test-reports/distributed.test_distributed_spawn_4.9_5fb338ab863a3c8f_.log (deflated 93%) 2025-12-04T12:25:14.3788652Z adding: test/test-reports/distributed.test_distributed_spawn_4.9_024341bf790fe69a_.log (deflated 93%) 2025-12-04T12:25:14.3798199Z adding: test/test-reports/distributed.test_distributed_spawn_4.9_807ef3b254ee9578_.log (deflated 94%) 2025-12-04T12:25:14.3807699Z adding: test/test-reports/distributed.test_distributed_spawn_4.9_a98bc48b8a2bbb0a_.log (deflated 94%) 2025-12-04T12:25:14.3808269Z adding: test/test-reports/distributed.test_distributed_spawn_7.9_e6318e4f5e3f044b_.log (deflated 12%) 2025-12-04T12:25:14.3808674Z adding: test/test-reports/distributed.test_distributed_spawn_7.9_7d14db48d459fad6_.log (deflated 12%) 2025-12-04T12:25:14.3809072Z adding: test/test-reports/distributed.test_distributed_spawn_7.9_867e6ca715844bef_.log (deflated 82%) 2025-12-04T12:25:14.3809595Z adding: test/test-reports/distributed.test_distributed_spawn_7.9_e3e9b753abf00510_.log (deflated 82%) 2025-12-04T12:25:14.3816968Z adding: test/test-reports/distributed.test_distributed_spawn_7.9_57c28f64236fb5f7_.log (deflated 93%) 2025-12-04T12:25:14.3824825Z adding: test/test-reports/distributed.test_distributed_spawn_7.9_e15417bf2d6aa02d_.log (deflated 93%) 2025-12-04T12:25:14.3832937Z adding: test/test-reports/distributed.test_distributed_spawn_7.9_7faf7d03bb4df9a2_.log (deflated 94%) 2025-12-04T12:25:14.3840598Z adding: test/test-reports/distributed.test_distributed_spawn_7.9_99251297b874e698_.log (deflated 94%) 2025-12-04T12:25:14.3841284Z adding: test/test-reports/distributed.test_serialization_1.1_13a719996bf7ed77_.log (deflated 73%) 2025-12-04T12:25:14.3843103Z adding: test/test-reports/distributed.fsdp.test_fsdp_ignored_modules_1.1_10f1fa8ebe15ff14_.log (deflated 84%) 2025-12-04T12:25:14.3918011Z adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_comm_1.1_365cd7de0daee87d_.log (deflated 95%) 2025-12-04T12:25:14.3921327Z adding: test/test-reports/distributed.fsdp.test_fsdp_sharded_grad_scaler_1.1_be49dd131ba0d1a6_.log (deflated 95%) 2025-12-04T12:25:14.3922769Z adding: test/test-reports/distributed._shard.sharding_plan.test_sharding_plan_1.1_abd5760a3cc4b6ac_.log (deflated 88%) 2025-12-04T12:25:14.3924582Z adding: test/test-reports/distributed._shard.sharded_optim.test_sharded_optim_1.1_eb895e054ba35bc4_.log (deflated 91%) 2025-12-04T12:25:14.3925891Z adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_state_dict_1.1_b527545a7e0cfc76_.log (deflated 84%) 2025-12-04T12:25:14.3930029Z adding: test/test-reports/distributed.tensor.test_utils_1.1_adf864a1b1c1212f_.log (deflated 93%) 2025-12-04T12:25:14.3930880Z adding: test/test-reports/distributed._composable.fsdp.test_fully_shard_memory_1.1_49e4cc8ab7bdec96_.log (deflated 64%) 2025-12-04T12:25:14.3961961Z adding: test/test-reports/distributed.checkpoint.test_state_dict_1.1_211422b52eb9ecc9_.log (deflated 98%) 2025-12-04T12:25:14.3962490Z adding: test/test-reports/distributed.checkpoint.test_state_dict_utils_1.1_53a76f3501a79ced_.log (deflated 85%) 2025-12-04T12:25:14.3963586Z adding: test/test-reports/distributed._shard.sharded_tensor.test_sharded_tensor_reshard_1.1_41e70f878ccc4095_.log (deflated 86%) 2025-12-04T12:25:14.3965415Z adding: test/test-reports/distributed.test_c10d_spawn_nccl_1.1_1bf221cec02d55ca_.log (deflated 91%) 2025-12-04T12:25:14.3966329Z adding: test/test-reports/distributed.test_c10d_spawn_ucc_1.1_5521268884e60126_.log (deflated 90%) 2025-12-04T12:25:14.4000546Z adding: test/test-reports/distributed.test_c10d_gloo_1.2_d5d0e2b1d744a982_.log (deflated 96%) 2025-12-04T12:25:14.4018922Z adding: test/test-reports/distributed._shard.sharded_tensor.test_sharded_tensor_1.1_24bd8bcdd0ba69c1_.log (deflated 96%) 2025-12-04T12:25:14.4037273Z adding: test/test-reports/distributed.test_c10d_nccl_3.3_41c01794b25a1cc6_.log (deflated 92%) 2025-12-04T12:25:14.4062171Z ##[group]Run # Remove any previous debugging artifacts if they exist 2025-12-04T12:25:14.4062405Z # Remove any previous debugging artifacts if they exist 2025-12-04T12:25:14.4062617Z rm -f debug-*.zip 2025-12-04T12:25:14.4062751Z if [ -d 'test/debug' ]; then 2025-12-04T12:25:14.4062918Z  zip -r "debug-${FILE_SUFFIX}.zip" test/debug 2025-12-04T12:25:14.4063017Z fi 2025-12-04T12:25:14.4068820Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:25:14.4068924Z env: 2025-12-04T12:25:14.4069039Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:14.4069263Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:14.4069437Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:14.4069734Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:14.4070028Z FILE_SUFFIX: test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904 2025-12-04T12:25:14.4070113Z ##[endgroup] 2025-12-04T12:25:14.4150057Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T12:25:14.4150142Z with: 2025-12-04T12:25:14.4150243Z s3-bucket: gha-artifacts 2025-12-04T12:25:14.4150405Z s3-prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T12:25:14.4150500Z retention-days: 14 2025-12-04T12:25:14.4150612Z if-no-files-found: warn 2025-12-04T12:25:14.4150712Z path: test-jsons-*.zip 2025-12-04T12:25:14.4150912Z name: artifact 2025-12-04T12:25:14.4151061Z region: us-east-1 2025-12-04T12:25:14.4151143Z env: 2025-12-04T12:25:14.4151240Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:14.4151345Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:14.4151510Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:14.4151807Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:14.4151907Z ##[endgroup] 2025-12-04T12:25:14.7856696Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T12:25:14.7857754Z With the provided path, there will be 1 file uploaded 2025-12-04T12:25:14.7858389Z Uploading to s3 prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T12:25:14.7899472Z Starting upload of test-jsons-test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904.zip 2025-12-04T12:25:14.9376797Z Finished upload of test-jsons-test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904.zip 2025-12-04T12:25:14.9545958Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T12:25:14.9546339Z with: 2025-12-04T12:25:14.9546591Z s3-bucket: gha-artifacts 2025-12-04T12:25:14.9546964Z s3-prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T12:25:14.9547364Z retention-days: 14 2025-12-04T12:25:14.9547661Z if-no-files-found: error 2025-12-04T12:25:14.9547986Z path: test-reports-*.zip 2025-12-04T12:25:14.9548289Z name: artifact 2025-12-04T12:25:14.9548542Z region: us-east-1 2025-12-04T12:25:14.9548910Z env: 2025-12-04T12:25:14.9549261Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:14.9549518Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:14.9549850Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:14.9550426Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:14.9550936Z ##[endgroup] 2025-12-04T12:25:15.2982840Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T12:25:15.2983399Z With the provided path, there will be 1 file uploaded 2025-12-04T12:25:15.2983964Z Uploading to s3 prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T12:25:15.3025921Z Starting upload of test-reports-test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904.zip 2025-12-04T12:25:15.4518021Z Finished upload of test-reports-test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904.zip 2025-12-04T12:25:15.4691523Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T12:25:15.4691864Z with: 2025-12-04T12:25:15.4692102Z s3-bucket: gha-artifacts 2025-12-04T12:25:15.4692438Z s3-prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T12:25:15.4692820Z retention-days: 14 2025-12-04T12:25:15.4693074Z if-no-files-found: ignore 2025-12-04T12:25:15.4693356Z path: logs-*.zip 2025-12-04T12:25:15.4693597Z name: artifact 2025-12-04T12:25:15.4693943Z region: us-east-1 2025-12-04T12:25:15.4694178Z env: 2025-12-04T12:25:15.4694398Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:15.4694665Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:15.4695013Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:15.4695603Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:15.4696126Z ##[endgroup] 2025-12-04T12:25:15.8124846Z NOTE: s3-prefix specified, ignoring name parameter 2025-12-04T12:25:15.8125380Z With the provided path, there will be 1 file uploaded 2025-12-04T12:25:15.8125908Z Uploading to s3 prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T12:25:15.8168536Z Starting upload of logs-test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904.zip 2025-12-04T12:25:15.9717742Z Finished upload of logs-test-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu_57116084904.zip 2025-12-04T12:25:15.9892073Z ##[group]Run seemethere/upload-artifact-s3@v5 2025-12-04T12:25:15.9892442Z with: 2025-12-04T12:25:15.9892691Z s3-bucket: gha-artifacts 2025-12-04T12:25:15.9893047Z s3-prefix: pytorch/pytorch/19922768520/1/artifact 2025-12-04T12:25:15.9893420Z retention-days: 14 2025-12-04T12:25:15.9893859Z if-no-files-found: ignore 2025-12-04T12:25:15.9894232Z path: debug-*.zip 2025-12-04T12:25:15.9894475Z name: artifact 2025-12-04T12:25:15.9894726Z region: us-east-1 2025-12-04T12:25:15.9894974Z env: 2025-12-04T12:25:15.9895203Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:15.9895482Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:15.9895832Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:15.9896575Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:15.9897302Z ##[endgroup] 2025-12-04T12:25:16.3255482Z No files were found with the provided path: debug-*.zip. No artifacts will be uploaded. 2025-12-04T12:25:16.3438184Z ##[group]Run # shellcheck disable=SC2156 2025-12-04T12:25:16.3438754Z # shellcheck disable=SC2156 2025-12-04T12:25:16.3439498Z find . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2025-12-04T12:25:16.3445917Z shell: /usr/bin/bash -e {0} 2025-12-04T12:25:16.3446264Z env: 2025-12-04T12:25:16.3446564Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:16.3447002Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:16.3447386Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:16.3448050Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:16.3448725Z ##[endgroup] 2025-12-04T12:25:16.6549878Z ##[group]Run seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a 2025-12-04T12:25:16.6550376Z with: 2025-12-04T12:25:16.6550742Z name: coredumps-distributed-3-3-lf.linux.g4dn.12xlarge.nvidia.gpu 2025-12-04T12:25:16.6551201Z retention-days: 14 2025-12-04T12:25:16.6551456Z if-no-files-found: ignore 2025-12-04T12:25:16.6551737Z path: ./**/core.[1-9]* 2025-12-04T12:25:16.6552020Z s3-bucket: gha-artifacts 2025-12-04T12:25:16.6552279Z region: us-east-1 2025-12-04T12:25:16.6552514Z env: 2025-12-04T12:25:16.6552729Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:16.6553002Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:16.6553321Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:16.6553904Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:16.6554426Z ##[endgroup] 2025-12-04T12:25:24.2262062Z No files were found with the provided path: ./**/core.[1-9]*. No artifacts will be uploaded. 2025-12-04T12:25:24.2494250Z Prepare all required actions 2025-12-04T12:25:24.2494652Z Getting action download info 2025-12-04T12:25:24.4261238Z Download action repository 'actions/setup-python@v6' (SHA:83679a892e2d95755f2dac6acb0bfd1e9ac5d548) 2025-12-04T12:25:24.8893517Z ##[group]Run ./.github/actions/upload-utilization-stats 2025-12-04T12:25:24.8893943Z with: 2025-12-04T12:25:24.8894272Z job_id: 57116084904 2025-12-04T12:25:24.8894919Z job_name: linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 3, 3, lf.linux.g4dn.12xlarge.nvidia.gpu, mem_leak_check) 2025-12-04T12:25:24.8895635Z workflow_name: trunk 2025-12-04T12:25:24.8895928Z workflow_run_id: 19922768520 2025-12-04T12:25:24.8896234Z workflow_attempt: 1 2025-12-04T12:25:24.8896627Z env: 2025-12-04T12:25:24.8897037Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:24.8897330Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:24.8897786Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:24.8898490Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:24.8899057Z ##[endgroup] 2025-12-04T12:25:24.8942466Z ##[group]Run actions/setup-python@v6 2025-12-04T12:25:24.8942820Z with: 2025-12-04T12:25:24.8943060Z python-version: 3.10 2025-12-04T12:25:24.8943358Z check-latest: false 2025-12-04T12:25:24.8943763Z token: *** 2025-12-04T12:25:24.8944037Z update-environment: true 2025-12-04T12:25:24.8944359Z allow-prereleases: false 2025-12-04T12:25:24.8944678Z freethreaded: false 2025-12-04T12:25:24.8944955Z env: 2025-12-04T12:25:24.8945186Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:24.8945489Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:24.8945939Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:24.8966609Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:24.8967198Z ##[endgroup] 2025-12-04T12:25:25.0513055Z ##[group]Installed versions 2025-12-04T12:25:25.0522723Z Version 3.10 was not found in the local cache 2025-12-04T12:25:25.0711520Z (node:434397) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. 2025-12-04T12:25:25.0712447Z (Use `node --trace-deprecation ...` to show where the warning was created) 2025-12-04T12:25:25.4181618Z ##[error]The version '3.10' with architecture 'x64' was not found for this operating system. The list of all available versions can be found here: https://raw.githubusercontent.com/actions/python-versions/main/versions-manifest.json 2025-12-04T12:25:25.4342083Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main 2025-12-04T12:25:25.4342590Z with: 2025-12-04T12:25:25.4342833Z env: 2025-12-04T12:25:25.4343076Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:25.4343374Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:25.4343744Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:25.4344402Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:25.4344968Z ##[endgroup] 2025-12-04T12:25:25.4362013Z ##[group]Run set -eou pipefail 2025-12-04T12:25:25.4362348Z set -eou pipefail 2025-12-04T12:25:25.4362610Z  2025-12-04T12:25:25.4362994Z echo "Holding runner for 2 hours until all ssh sessions have logged out" 2025-12-04T12:25:25.4363481Z for _ in $(seq 1440); do 2025-12-04T12:25:25.4363830Z  # Break if no ssh session exists anymore 2025-12-04T12:25:25.4364189Z  if [ "$(who)" = "" ]; then 2025-12-04T12:25:25.4364543Z  break 2025-12-04T12:25:25.4364773Z  fi 2025-12-04T12:25:25.4365004Z  echo "." 2025-12-04T12:25:25.4365255Z  sleep 5 2025-12-04T12:25:25.4365484Z done 2025-12-04T12:25:25.4371135Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:25:25.4371528Z env: 2025-12-04T12:25:25.4371752Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:25.4372018Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:25.4372347Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:25.4372931Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:25.4373439Z ##[endgroup] 2025-12-04T12:25:25.4399719Z Holding runner for 2 hours until all ssh sessions have logged out 2025-12-04T12:25:25.4480452Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T12:25:25.4481123Z # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T12:25:25.4481591Z # shellcheck disable=SC2046 2025-12-04T12:25:25.4481942Z docker stop $(docker ps -q) || true 2025-12-04T12:25:25.4482310Z # Prune all of the docker images 2025-12-04T12:25:25.4482646Z docker system prune -af 2025-12-04T12:25:25.4488144Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:25:25.4488536Z env: 2025-12-04T12:25:25.4488763Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:25.4489037Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:25.4489366Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:25.4489943Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:25.4490466Z ##[endgroup] 2025-12-04T12:25:36.4602247Z 9f53f9c599eb 2025-12-04T12:25:37.0897028Z Deleted Containers: 2025-12-04T12:25:37.0897582Z 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:37.0898010Z 2025-12-04T12:25:44.4573900Z Deleted Images: 2025-12-04T12:25:44.4574374Z untagged: public.ecr.aws/docker/library/python:3.13 2025-12-04T12:25:44.4575225Z untagged: public.ecr.aws/docker/library/python@sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0 2025-12-04T12:25:44.4576611Z deleted: sha256:44438aecfedf7b6086fce506dae0db5ba7fc0027f9b743f1a75a6b5cbc7de70a 2025-12-04T12:25:44.4577544Z deleted: sha256:6f09a1f5d8a107c2532fbd116e75116cb75fa77b1a7d72d3bdf1ac12de152acd 2025-12-04T12:25:44.4578310Z deleted: sha256:fe5f3ac0be086125eb1e3cd10cc33e8e426f4e079381f7ce5a987b626e99fa67 2025-12-04T12:25:44.4579073Z deleted: sha256:79dd2061a22cf919cfc4f1f02704bfda09afadb017265e670ee54441d296c06c 2025-12-04T12:25:44.4579838Z deleted: sha256:9447ad402aafdbee17e999b0ec84ad89c2646dbebf054d469d4f8bee77f66212 2025-12-04T12:25:44.4580579Z deleted: sha256:7a4909f3c1975be52292f53107495ee1b41c17494918767ccedf1cf1688ae318 2025-12-04T12:25:44.4581308Z deleted: sha256:3474923d97f1f498237650a7d51bd4aea37d5e6b9d8a778777920584af5dd560 2025-12-04T12:25:44.4582268Z deleted: sha256:683afd1773444401a9cbd24842ee5d9154a11abb4fab63ddea5c03df788597ee 2025-12-04T12:25:44.4583459Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T12:25:44.4585024Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image@sha256:ba21003510dba4bdeed83df81a56fa468e0ee1b612a9445ae1f402a280804f97 2025-12-04T12:25:44.4586096Z deleted: sha256:add7313791033822205cdb3cf32096534b2cfaa4855bd48119b59000bfe00301 2025-12-04T12:25:44.4586849Z deleted: sha256:85a76b7bf29ad34eb76cce6f46af5d49a58b6272f80f983d5c769e82c7749301 2025-12-04T12:25:44.4587604Z deleted: sha256:0882f3ce59ff5ae30195ee4b059fc713e13eda107a3a7814a4616ac9058a30a4 2025-12-04T12:25:44.4588473Z deleted: sha256:64ba5b9344c11a3e4729136076830b90ac4cf1554046edb1bd4f0784b66ebd9b 2025-12-04T12:25:44.4589295Z deleted: sha256:88213c59cf461a65ab9b6cb07b4195dc9d41b5241c152daa002c7b3112e09124 2025-12-04T12:25:44.4590014Z deleted: sha256:4c0f83afa802ffbc05ebaf1aa50e48a2447c7c295549a6dded80ac63437906ca 2025-12-04T12:25:44.4590721Z deleted: sha256:6f7ec74460e8fb070c8209949095ea3be5f4e2fd69c9f750cd39ac4093f5e64b 2025-12-04T12:25:44.4591415Z deleted: sha256:d6928b0d1021b31942fdcb64e5eb4a34682de66e959dd424ed6ed02c29cd706d 2025-12-04T12:25:44.4592118Z deleted: sha256:4e9fbcb1705a6351bb34dd320558752614308636b94fd9ae6f26063e3deadc0a 2025-12-04T12:25:44.4592811Z deleted: sha256:43aabd0201f48712f21758071352dea029b4de37be08b2e2197706856a9ecbf2 2025-12-04T12:25:44.4593495Z deleted: sha256:940a98dec78303f0548beb1033242a45e9097607ef3e55c8b949b69b73d1b95e 2025-12-04T12:25:44.4594193Z deleted: sha256:d2849fa0e0411cf66e4408831d70e38838afb55b11a80c1c4d8aa0ae7dc9ca40 2025-12-04T12:25:44.4594886Z deleted: sha256:14f40d23c20c7e562623f89deb376520296758bc39dd3c77284049b84ebd8a31 2025-12-04T12:25:44.4595768Z deleted: sha256:a8ccba61f90ca097cb391d0f4fbed0d9f821d06b00e28f7332e9e2dcfcbac4ca 2025-12-04T12:25:44.4596566Z deleted: sha256:91b2060d290547d3b517d4a11d994bbe23f4560b5546cb91918ca1828dde6be1 2025-12-04T12:25:44.4597287Z deleted: sha256:b42a184755715dcfead7fad655a127433541d316d9628f5f730ff17ad5f8071c 2025-12-04T12:25:44.4598024Z deleted: sha256:aa5b4f3c9169061dc3c6da0e677e8a86f11ecb0a3f9fb4861ab3d8c04379775c 2025-12-04T12:25:44.4598742Z deleted: sha256:b4dcf450081a48d77fea0a21b8d810a69c03608a595e754fe7d365058d0579b7 2025-12-04T12:25:44.4599472Z deleted: sha256:4f7fe12d3d4f5bf890c7ada4ce16f17a105472aa6509a778f917dcce2f28174b 2025-12-04T12:25:44.4600207Z deleted: sha256:2d1d5a74182594f9a8553df00fdcfc809dba407bcd6700d667f862cbe9d555ce 2025-12-04T12:25:44.4600944Z deleted: sha256:d901e2f5d449aeed16b727bdcc11fc0e0f6c30c8fc5c39ac7eeac8a74d9d176c 2025-12-04T12:25:44.4601655Z deleted: sha256:a04df2603bd12372c6632469a9a81ebc4a8d677452c250672b9692884fa6a452 2025-12-04T12:25:44.4602371Z deleted: sha256:f438a6b52273a552dc3820a55c74c53a62a0eae9f2a7d21b37125add7d71639f 2025-12-04T12:25:44.4603092Z deleted: sha256:d4b09517e9518d709ac98b0ae6f8446ec9ac51688253607b1fca67aa2c87b3f4 2025-12-04T12:25:44.4603919Z deleted: sha256:c1fa38335237f5e7263e39d3d3de98215bcfbbb12b826955c02e149bf68efd13 2025-12-04T12:25:44.4604606Z deleted: sha256:c898d20a30de901fca74d7611663b17ab48e1726a11e031e40548ed16ee81877 2025-12-04T12:25:44.4605344Z deleted: sha256:3baceec7096518fcc10696feba551639d698b3145c2fc09cac927bb60c0fd751 2025-12-04T12:25:44.4606049Z deleted: sha256:5245aaaa3d5c3a19f76b9a6c920bd82d1a0ff5289f87c8c109652089709d9b3b 2025-12-04T12:25:44.4606738Z deleted: sha256:f05cc789b95246938c377f474c41187965b89ceac0250e7d5124bec32153f447 2025-12-04T12:25:44.4607438Z deleted: sha256:07ec4fc008de4e7a2c794ec7094cc72e0d287c04c8b2156163aee0bae147fe2d 2025-12-04T12:25:44.4608151Z deleted: sha256:c6302601ad5fde573c1f8c900250478fca7fdc6907d8fd4fae651b94b4d9264d 2025-12-04T12:25:44.4608858Z deleted: sha256:cc5e955ee1dc54931f02606c5ea87aae14f03b5d764492be611480ab041f2882 2025-12-04T12:25:44.4609550Z deleted: sha256:f21c03518996d98452338f4e80bcfd9b139a1dab155f4830be0d3f623035269f 2025-12-04T12:25:44.4610351Z deleted: sha256:519ca6f1279f7886f25f0005527cfa627deebbc5b7d7cdbfa7ef962bcfc4c26d 2025-12-04T12:25:44.4611048Z deleted: sha256:0ef990495216807d0175b192045be3f617e72331bc373b3434807f41bf69168d 2025-12-04T12:25:44.4611911Z deleted: sha256:7093edf7319e1f0e01654c3224e32c8dede5b948d106e0b9b03cbf0bb1091e33 2025-12-04T12:25:44.4612633Z deleted: sha256:c478161e058e2f4041555c3e880b95ee1ee047938dc58549a3a88135740996ae 2025-12-04T12:25:44.4613356Z deleted: sha256:9bb853b0d938cd7c36a80ce8ee40653f2c0ff92719209b11beb03acc8855ce3e 2025-12-04T12:25:44.4614088Z deleted: sha256:fdf2ace71a78ce6910ef9c4b073c195531da47022443b606bb92dcd6499b6afc 2025-12-04T12:25:44.4614901Z deleted: sha256:576c2b3770d871937d3cfb7014328bcb4bd1aed0c28bc438764b3bfdac4c1ac2 2025-12-04T12:25:44.4615650Z deleted: sha256:878e92b9cb82de09ac14a9d5f3f7bc2411a799b6f54d0d64b78c2bb4d1fdc0fc 2025-12-04T12:25:44.4616462Z deleted: sha256:85c8c3b98b65a6695f988a10cc66c981d73a3ef03eda15b8e14d227b50b56300 2025-12-04T12:25:44.4617397Z deleted: sha256:ce2ab3ba07794f9ee95d6ea7de6dcd3d2aed96561f9a79192dd56ca5bf29313a 2025-12-04T12:25:44.4618138Z deleted: sha256:37a6e12976ca957286977e696e63012ab9821214b0483fe1a48d29dcb280508a 2025-12-04T12:25:44.4618880Z deleted: sha256:cd1d5d3dd7038144ca6fe961c0d4c8e705625ae0c36190ba8b3e9602abedad19 2025-12-04T12:25:44.4619627Z deleted: sha256:0e707276e0be2e0008b86d594fadc0d16444d66c4fb7227c56f144cbb3c2affd 2025-12-04T12:25:44.4620365Z deleted: sha256:22d4aad6a2ada91b341c1225a0f314042b8aeabef7568c5c019709b058bf070b 2025-12-04T12:25:44.4621344Z deleted: sha256:ee4adacf4e0933131d0275eddad406b3c8147e6cf07a292b99f1aff4b5355f33 2025-12-04T12:25:44.4622102Z deleted: sha256:43da0b9e7c0e18403dcb834e53628dc7c970ccb2dbd091878c0d7c0170dbc97f 2025-12-04T12:25:44.4622860Z deleted: sha256:00571684bdcd75beda15eb7d4e79b5458bc914350f9bb4d87fcdc97ad15e0da1 2025-12-04T12:25:44.4623596Z deleted: sha256:41615f09950259f1d75e82ef35b6fc53b18fe71ebff143744cfd51009d04349e 2025-12-04T12:25:44.4624425Z deleted: sha256:75ab34d2eed3c7915467a506ab6dab2711918fbabe94add2fb5c62780221ab0c 2025-12-04T12:25:44.4625188Z deleted: sha256:0a39ef2bebf44c1c3893d1e5fb42dad48b8fac7ca673141267ee967f85455e89 2025-12-04T12:25:44.4625938Z deleted: sha256:9b7d024e48ba1f9824a54597621b1b062cbc4aa41a77d81ca538d6b5c24a612c 2025-12-04T12:25:44.4626687Z deleted: sha256:392257172de6434c271bd93394218a91e9aa86d7c18abc2f2759317b9d5fb6de 2025-12-04T12:25:44.4627414Z deleted: sha256:6c3232860b930866a463a356124fc392c7e5f04895695229257e8c3e8a02711d 2025-12-04T12:25:44.4628152Z deleted: sha256:63dd55b807215e2fa6c715419ac0c5072d02dddc848dbf74bb7e77b906b5eaed 2025-12-04T12:25:44.4628886Z deleted: sha256:07a8738c1b4584db72ed9aa60f5274321eb0ba16263450da3a75df8326ebc25f 2025-12-04T12:25:44.4629623Z deleted: sha256:053fe2965b01281d12040ec1893e0d1aa77362a49ea9a1067402272c69dad9f5 2025-12-04T12:25:44.4630364Z deleted: sha256:7857fb5eb181c4e80262ecab60bdd3c266cf3d1409ceb76c05882609b416a8d3 2025-12-04T12:25:44.4631104Z deleted: sha256:752528477fc99089de3bd2c6da7b30cf34f2e901fe06d8fcfe685b411461e883 2025-12-04T12:25:44.4631858Z deleted: sha256:cce0210e2f4b042601813df03aa294a86b0c668fcfc75f4c63f6fa12b2952e15 2025-12-04T12:25:44.4632711Z deleted: sha256:f2bb405a26705ecd12d21380d26d9355d01db3a2175080fbdb468f2b5a25a76c 2025-12-04T12:25:44.4633619Z deleted: sha256:ad430120d4ffbaf97cd8d6de6ea8eefa4a8f80ec45f0b176c6b26bff0970fd33 2025-12-04T12:25:44.4634287Z deleted: sha256:225a4910baea7cc540ed43eeac75046293800ab0b8e0192b51e991c8cb50bcf3 2025-12-04T12:25:44.4634996Z deleted: sha256:a259945b0c3507f049fbac10fb3d3ffe43d45e83c91b80ae8cd1dafb855ad83c 2025-12-04T12:25:44.4635653Z deleted: sha256:862a98881b1d5adad5c21d01602773b894794097de80964ef8f47bcaadb43255 2025-12-04T12:25:44.4636310Z deleted: sha256:1cf6d3c8b6c2694b79a2d08719594903811c330a36a4c7a8a7153a350b53d292 2025-12-04T12:25:44.4636978Z deleted: sha256:232a1ae8b0fee817ff7838bb5986a2f38377d3b1dbbf5217b576af0f953b0844 2025-12-04T12:25:44.4637650Z deleted: sha256:c72c5705dabd6314423dd7d4fb260a20d5d9886b2ebce60d19e9d78c4a2335c2 2025-12-04T12:25:44.4638384Z deleted: sha256:296734cf81fd92c913884d058908598424ffe072676e38de289bbab83768c7bd 2025-12-04T12:25:44.4639044Z deleted: sha256:7c76040481b889847a1804021aeff07547eaa4ee706d6137db218d497a8fd9c1 2025-12-04T12:25:44.4639717Z deleted: sha256:d5e293f5b354e8cbcc6de893ea72cc632b02d8fdfbb08ec3127c4e9662f3ebff 2025-12-04T12:25:44.4640379Z deleted: sha256:f35a64e429c88e249645090f21fbe7dae108d98e0ab4ea13184f24b3fd66c315 2025-12-04T12:25:44.4641048Z deleted: sha256:ce6ae8d595c8e69115c51b1ce4f9a9158795d7b863b1cb53f21c39a87974d41b 2025-12-04T12:25:44.4641722Z deleted: sha256:8941abaee59400fb9b3a60765fea4a1fc2a6a447467a6d983e84c7f72494a450 2025-12-04T12:25:44.4642401Z deleted: sha256:ef53c29a9a2c2bc80ffdb9bfaf92842436b5755ec1ce828b9d11e5e27d656ea1 2025-12-04T12:25:44.4643069Z deleted: sha256:7a347fb0acb43f1c814f8c8ff21185e8b5cf64d7bc5988cea060f77d906e08b5 2025-12-04T12:25:44.4643751Z deleted: sha256:cc855dc9be79496e15175569dced2d13477e50b077a5fd3945f9bf50018880c1 2025-12-04T12:25:44.4644426Z deleted: sha256:f7a9946ada3d4786658bc0b643808bb32a9a45e4e90e30dc43ee19e2dbe24024 2025-12-04T12:25:44.4645096Z deleted: sha256:c22a9215f62812c1d2e32827f5221ff556c5b6702aadbdab6b87b8293f19635e 2025-12-04T12:25:44.4645747Z deleted: sha256:959a56746620012e37c1def1a83c5afb1e7c0adc59b021a28beb53c24df98032 2025-12-04T12:25:44.4646419Z deleted: sha256:31a0fff0695bf6100c17954be72eab2095b466d559c75c3faf2a17d8c41e6ebe 2025-12-04T12:25:44.4647088Z deleted: sha256:c15e2b5241b9e55af1b2593e544391b4b44d0505e6528e8f12425136e93b424c 2025-12-04T12:25:44.4647733Z deleted: sha256:73974f74b436f39a2fdb6461b1e3f7c3e41c73325776fa71d16b942a5b4a365b 2025-12-04T12:25:44.4648137Z 2025-12-04T12:25:44.4648261Z Total reclaimed space: 36.15GB 2025-12-04T12:25:44.4684415Z ##[group]Run set +e 2025-12-04T12:25:44.4684805Z set +e 2025-12-04T12:25:44.4685054Z set -x 2025-12-04T12:25:44.4685365Z  2025-12-04T12:25:44.4685604Z nvidia-smi 2025-12-04T12:25:44.4686180Z # NB: Surprisingly, nvidia-smi command returns successfully with return code 0 even in 2025-12-04T12:25:44.4686978Z # the case where the driver has already crashed as it still can get the driver version 2025-12-04T12:25:44.4687753Z # and some basic information like the bus ID. However, the rest of the information 2025-12-04T12:25:44.4688451Z # would be missing (ERR!), for example: 2025-12-04T12:25:44.4688898Z # 2025-12-04T12:25:44.4689219Z # +-----------------------------------------------------------------------------+ 2025-12-04T12:25:44.4689782Z # | NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 | 2025-12-04T12:25:44.4690364Z # |-------------------------------+----------------------+----------------------+ 2025-12-04T12:25:44.4690922Z # | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T12:25:44.4691531Z # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2025-12-04T12:25:44.4692037Z # | | | MIG M. | 2025-12-04T12:25:44.4692417Z # |===============================+======================+======================| 2025-12-04T12:25:44.4692902Z # | 0 ERR! Off | 00000000:00:1E.0 Off | ERR! | 2025-12-04T12:25:44.4693413Z # |ERR! ERR! ERR! ERR! / ERR! | 4184MiB / 23028MiB | ERR! Default | 2025-12-04T12:25:44.4693875Z # | | | ERR! | 2025-12-04T12:25:44.4694312Z # +-------------------------------+----------------------+----------------------+ 2025-12-04T12:25:44.4694714Z # 2025-12-04T12:25:44.4695030Z # +-----------------------------------------------------------------------------+ 2025-12-04T12:25:44.4695514Z # | Processes: | 2025-12-04T12:25:44.4696004Z # | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T12:25:44.4696594Z # | ID ID Usage | 2025-12-04T12:25:44.4697194Z # |=============================================================================| 2025-12-04T12:25:44.4697700Z # +-----------------------------------------------------------------------------+ 2025-12-04T12:25:44.4698136Z # 2025-12-04T12:25:44.4698594Z # This should be reported as a failure instead as it will guarantee to fail when 2025-12-04T12:25:44.4699201Z # Docker tries to run with --gpus all 2025-12-04T12:25:44.4699565Z # 2025-12-04T12:25:44.4699994Z # So, the correct check here is to query one of the missing piece of info like 2025-12-04T12:25:44.4700618Z # GPU name, so that the command can fail accordingly 2025-12-04T12:25:44.4701195Z nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T12:25:44.4701680Z NVIDIA_SMI_STATUS=$? 2025-12-04T12:25:44.4701989Z  2025-12-04T12:25:44.4702500Z # These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action 2025-12-04T12:25:44.4703257Z if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then 2025-12-04T12:25:44.4703949Z  echo "NVIDIA driver installation has failed, shutting down the runner..." 2025-12-04T12:25:44.4704541Z  .github/scripts/stop_runner_service.sh 2025-12-04T12:25:44.4704925Z fi 2025-12-04T12:25:44.4705153Z  2025-12-04T12:25:44.4705824Z # For runner with multiple GPUs, we also want to confirm that the number of GPUs are the 2025-12-04T12:25:44.4706571Z # power of 2, i.e. 1, 2, 4, or 8. This is to avoid flaky test issue when one GPU fails 2025-12-04T12:25:44.4707242Z # https://github.com/pytorch/test-infra/issues/4000 2025-12-04T12:25:44.4707736Z GPU_COUNT=$(nvidia-smi --list-gpus | wc -l) 2025-12-04T12:25:44.4708153Z NVIDIA_SMI_STATUS=$? 2025-12-04T12:25:44.4708463Z  2025-12-04T12:25:44.4709073Z # These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action 2025-12-04T12:25:44.4709751Z if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then 2025-12-04T12:25:44.4710365Z  echo "NVIDIA driver installation has failed, shutting down the runner..." 2025-12-04T12:25:44.4710896Z  .github/scripts/stop_runner_service.sh 2025-12-04T12:25:44.4711224Z fi 2025-12-04T12:25:44.4711445Z  2025-12-04T12:25:44.4711703Z # Check the GPU count to be a power of 2 2025-12-04T12:25:44.4712276Z if [ "$GPU_COUNT" -le 8 ] && [ "$GPU_COUNT" -ne 1 ] && [ "$GPU_COUNT" -ne 2 ] && [ "$GPU_COUNT" -ne 4 ] && [ "$GPU_COUNT" -ne 8 ]; then 2025-12-04T12:25:44.4713064Z  echo "NVIDIA driver detects $GPU_COUNT GPUs. The runner has a broken GPU, shutting it down..." 2025-12-04T12:25:44.4713662Z  .github/scripts/stop_runner_service.sh 2025-12-04T12:25:44.4714035Z fi 2025-12-04T12:25:44.4723799Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:25:44.4724248Z env: 2025-12-04T12:25:44.4724503Z GIT_DEFAULT_BRANCH: main 2025-12-04T12:25:44.4724807Z HAS_NVIDIA_GPU: true 2025-12-04T12:25:44.4725179Z GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all 2025-12-04T12:25:44.4725834Z DOCKER_CONTAINER_ID: 9f53f9c599eb7471ecf6fa9ab293671ed106354cd60a224ee690c62820b37f15 2025-12-04T12:25:44.4726411Z ##[endgroup] 2025-12-04T12:25:44.4754605Z + nvidia-smi 2025-12-04T12:25:44.5218063Z Thu Dec 4 12:25:44 2025 2025-12-04T12:25:44.5218557Z +-----------------------------------------------------------------------------------------+ 2025-12-04T12:25:44.5219198Z | NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 | 2025-12-04T12:25:44.5219825Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T12:25:44.5220458Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2025-12-04T12:25:44.5221363Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2025-12-04T12:25:44.5221915Z | | | MIG M. | 2025-12-04T12:25:44.5222323Z |=========================================+========================+======================| 2025-12-04T12:25:44.5878833Z | 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 | 2025-12-04T12:25:44.5879441Z | N/A 27C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T12:25:44.5879924Z | | | N/A | 2025-12-04T12:25:44.5880425Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T12:25:44.5880966Z | 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 | 2025-12-04T12:25:44.5881470Z | N/A 26C P8 9W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T12:25:44.5881928Z | | | N/A | 2025-12-04T12:25:44.5882415Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T12:25:44.5882949Z | 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 | 2025-12-04T12:25:44.5883682Z | N/A 25C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T12:25:44.5884158Z | | | N/A | 2025-12-04T12:25:44.5884713Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T12:25:44.5885230Z | 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 | 2025-12-04T12:25:44.5885746Z | N/A 26C P8 13W / 70W | 0MiB / 15360MiB | 0% Default | 2025-12-04T12:25:44.5886220Z | | | N/A | 2025-12-04T12:25:44.5886705Z +-----------------------------------------+------------------------+----------------------+ 2025-12-04T12:25:44.5891414Z 2025-12-04T12:25:44.5891737Z +-----------------------------------------------------------------------------------------+ 2025-12-04T12:25:44.5892503Z | Processes: | 2025-12-04T12:25:44.5893056Z | GPU GI CI PID Type Process name GPU Memory | 2025-12-04T12:25:44.5893629Z | ID ID Usage | 2025-12-04T12:25:44.5894044Z |=========================================================================================| 2025-12-04T12:25:44.5914946Z | No running processes found | 2025-12-04T12:25:44.5915737Z +-----------------------------------------------------------------------------------------+ 2025-12-04T12:25:45.2630653Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 2025-12-04T12:25:45.2812292Z Tesla T4 2025-12-04T12:25:45.3086896Z + NVIDIA_SMI_STATUS=0 2025-12-04T12:25:45.3087309Z + '[' 0 -ne 0 ']' 2025-12-04T12:25:45.3092368Z ++ nvidia-smi --list-gpus 2025-12-04T12:25:45.3093948Z ++ wc -l 2025-12-04T12:25:45.3557742Z + GPU_COUNT=4 2025-12-04T12:25:45.3558041Z + NVIDIA_SMI_STATUS=0 2025-12-04T12:25:45.3558509Z + '[' 0 -ne 0 ']' 2025-12-04T12:25:45.3558863Z + '[' 4 -le 8 ']' 2025-12-04T12:25:45.3559114Z + '[' 4 -ne 1 ']' 2025-12-04T12:25:45.3559345Z + '[' 4 -ne 2 ']' 2025-12-04T12:25:45.3559599Z + '[' 4 -ne 4 ']' 2025-12-04T12:25:45.3644017Z Post job cleanup. 2025-12-04T12:25:45.3725670Z Post job cleanup. 2025-12-04T12:25:45.3773374Z Post job cleanup. 2025-12-04T12:25:45.4785224Z [command]/usr/bin/git version 2025-12-04T12:25:45.4825455Z git version 2.50.1 2025-12-04T12:25:45.4863003Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/53556366-c1b9-4fa8-82d6-046dda343be8/.gitconfig' 2025-12-04T12:25:45.4872140Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/53556366-c1b9-4fa8-82d6-046dda343be8' before making global git config changes 2025-12-04T12:25:45.4873208Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T12:25:45.4877477Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2025-12-04T12:25:45.4927167Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T12:25:45.4959307Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T12:25:45.5291118Z Entering 'android/libs/fbjni' 2025-12-04T12:25:45.5347783Z Entering 'third_party/FP16' 2025-12-04T12:25:45.5408368Z Entering 'third_party/FXdiv' 2025-12-04T12:25:45.5465841Z Entering 'third_party/NNPACK' 2025-12-04T12:25:45.5524039Z Entering 'third_party/NVTX' 2025-12-04T12:25:45.5583575Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T12:25:45.5641907Z Entering 'third_party/XNNPACK' 2025-12-04T12:25:45.5718283Z Entering 'third_party/aiter' 2025-12-04T12:25:45.5777971Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T12:25:45.5846017Z Entering 'third_party/benchmark' 2025-12-04T12:25:45.5905950Z Entering 'third_party/composable_kernel' 2025-12-04T12:25:45.5976806Z Entering 'third_party/cpp-httplib' 2025-12-04T12:25:45.6038380Z Entering 'third_party/cpuinfo' 2025-12-04T12:25:45.6097051Z Entering 'third_party/cudnn_frontend' 2025-12-04T12:25:45.6157866Z Entering 'third_party/cutlass' 2025-12-04T12:25:45.6229603Z Entering 'third_party/fbgemm' 2025-12-04T12:25:45.6291358Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T12:25:45.6347193Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T12:25:45.6415720Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T12:25:45.6478895Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T12:25:45.6546208Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T12:25:45.6602444Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T12:25:45.6667321Z Entering 'third_party/fbgemm/external/json' 2025-12-04T12:25:45.6730048Z Entering 'third_party/flash-attention' 2025-12-04T12:25:45.6792693Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T12:25:45.6855382Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T12:25:45.6927997Z Entering 'third_party/flatbuffers' 2025-12-04T12:25:45.6992991Z Entering 'third_party/fmt' 2025-12-04T12:25:45.7053075Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T12:25:45.7113316Z Entering 'third_party/gloo' 2025-12-04T12:25:45.7173059Z Entering 'third_party/googletest' 2025-12-04T12:25:45.7231156Z Entering 'third_party/ideep' 2025-12-04T12:25:45.7292334Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T12:25:45.7359927Z Entering 'third_party/ittapi' 2025-12-04T12:25:45.7418095Z Entering 'third_party/kineto' 2025-12-04T12:25:45.7482862Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T12:25:45.7538560Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T12:25:45.7599249Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T12:25:45.7656981Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T12:25:45.7717701Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T12:25:45.7778640Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T12:25:45.7840938Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T12:25:45.7902245Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T12:25:45.7961635Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T12:25:45.8018643Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T12:25:45.8076662Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T12:25:45.8132282Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T12:25:45.8195453Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T12:25:45.8263333Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T12:25:45.8319985Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T12:25:45.8383644Z Entering 'third_party/kleidiai' 2025-12-04T12:25:45.8444646Z Entering 'third_party/mimalloc' 2025-12-04T12:25:45.8501091Z Entering 'third_party/nlohmann' 2025-12-04T12:25:45.8563307Z Entering 'third_party/onnx' 2025-12-04T12:25:45.8644063Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T12:25:45.8701635Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T12:25:45.8763271Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T12:25:45.8818510Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T12:25:45.8876378Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T12:25:45.8936041Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T12:25:45.8995519Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T12:25:45.9051973Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T12:25:45.9108009Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T12:25:45.9172715Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T12:25:45.9236376Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T12:25:45.9300444Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T12:25:45.9378599Z Entering 'third_party/pocketfft' 2025-12-04T12:25:45.9438514Z Entering 'third_party/protobuf' 2025-12-04T12:25:45.9499095Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T12:25:45.9556764Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T12:25:45.9614970Z Entering 'third_party/psimd' 2025-12-04T12:25:45.9675284Z Entering 'third_party/pthreadpool' 2025-12-04T12:25:45.9739370Z Entering 'third_party/pybind11' 2025-12-04T12:25:45.9797792Z Entering 'third_party/python-peachpy' 2025-12-04T12:25:45.9857812Z Entering 'third_party/sleef' 2025-12-04T12:25:45.9916294Z Entering 'third_party/tensorpipe' 2025-12-04T12:25:45.9976200Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T12:25:46.0039257Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T12:25:46.0095811Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T12:25:46.0155782Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T12:25:46.0211953Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T12:25:46.0295344Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T12:25:46.0319302Z http.https://github.com/.extraheader 2025-12-04T12:25:46.0328351Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-12-04T12:25:46.0360241Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T12:25:46.0679000Z Entering 'android/libs/fbjni' 2025-12-04T12:25:46.0718659Z http.https://github.com/.extraheader 2025-12-04T12:25:46.0760182Z Entering 'third_party/FP16' 2025-12-04T12:25:46.0799891Z http.https://github.com/.extraheader 2025-12-04T12:25:46.0837131Z Entering 'third_party/FXdiv' 2025-12-04T12:25:46.0877434Z http.https://github.com/.extraheader 2025-12-04T12:25:46.0921800Z Entering 'third_party/NNPACK' 2025-12-04T12:25:46.0962166Z http.https://github.com/.extraheader 2025-12-04T12:25:46.0996823Z Entering 'third_party/NVTX' 2025-12-04T12:25:46.1038126Z http.https://github.com/.extraheader 2025-12-04T12:25:46.1075859Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T12:25:46.1115740Z http.https://github.com/.extraheader 2025-12-04T12:25:46.1153449Z Entering 'third_party/XNNPACK' 2025-12-04T12:25:46.1193001Z http.https://github.com/.extraheader 2025-12-04T12:25:46.1247252Z Entering 'third_party/aiter' 2025-12-04T12:25:46.1288197Z http.https://github.com/.extraheader 2025-12-04T12:25:46.1323062Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T12:25:46.1362872Z http.https://github.com/.extraheader 2025-12-04T12:25:46.1412158Z Entering 'third_party/benchmark' 2025-12-04T12:25:46.1452902Z http.https://github.com/.extraheader 2025-12-04T12:25:46.1490898Z Entering 'third_party/composable_kernel' 2025-12-04T12:25:46.1531090Z http.https://github.com/.extraheader 2025-12-04T12:25:46.1576693Z Entering 'third_party/cpp-httplib' 2025-12-04T12:25:46.1617507Z http.https://github.com/.extraheader 2025-12-04T12:25:46.1653763Z Entering 'third_party/cpuinfo' 2025-12-04T12:25:46.1695460Z http.https://github.com/.extraheader 2025-12-04T12:25:46.1735249Z Entering 'third_party/cudnn_frontend' 2025-12-04T12:25:46.1774962Z http.https://github.com/.extraheader 2025-12-04T12:25:46.1815757Z Entering 'third_party/cutlass' 2025-12-04T12:25:46.1857980Z http.https://github.com/.extraheader 2025-12-04T12:25:46.1908744Z Entering 'third_party/fbgemm' 2025-12-04T12:25:46.1948782Z http.https://github.com/.extraheader 2025-12-04T12:25:46.1988875Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T12:25:46.2031032Z http.https://github.com/.extraheader 2025-12-04T12:25:46.2065391Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T12:25:46.2102519Z http.https://github.com/.extraheader 2025-12-04T12:25:46.2148258Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T12:25:46.2187783Z http.https://github.com/.extraheader 2025-12-04T12:25:46.2223433Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T12:25:46.2261259Z http.https://github.com/.extraheader 2025-12-04T12:25:46.2307073Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T12:25:46.2345638Z http.https://github.com/.extraheader 2025-12-04T12:25:46.2390740Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T12:25:46.2427595Z http.https://github.com/.extraheader 2025-12-04T12:25:46.2468172Z Entering 'third_party/fbgemm/external/json' 2025-12-04T12:25:46.2507988Z http.https://github.com/.extraheader 2025-12-04T12:25:46.2544883Z Entering 'third_party/flash-attention' 2025-12-04T12:25:46.2583076Z http.https://github.com/.extraheader 2025-12-04T12:25:46.2619791Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T12:25:46.2657993Z http.https://github.com/.extraheader 2025-12-04T12:25:46.2698494Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T12:25:46.2737965Z http.https://github.com/.extraheader 2025-12-04T12:25:46.2782643Z Entering 'third_party/flatbuffers' 2025-12-04T12:25:46.2821676Z http.https://github.com/.extraheader 2025-12-04T12:25:46.2861298Z Entering 'third_party/fmt' 2025-12-04T12:25:46.2900032Z http.https://github.com/.extraheader 2025-12-04T12:25:46.2937961Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T12:25:46.2978050Z http.https://github.com/.extraheader 2025-12-04T12:25:46.3014781Z Entering 'third_party/gloo' 2025-12-04T12:25:46.3053813Z http.https://github.com/.extraheader 2025-12-04T12:25:46.3093160Z Entering 'third_party/googletest' 2025-12-04T12:25:46.3131833Z http.https://github.com/.extraheader 2025-12-04T12:25:46.3168968Z Entering 'third_party/ideep' 2025-12-04T12:25:46.3209040Z http.https://github.com/.extraheader 2025-12-04T12:25:46.3243807Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T12:25:46.3281364Z http.https://github.com/.extraheader 2025-12-04T12:25:46.3327994Z Entering 'third_party/ittapi' 2025-12-04T12:25:46.3368609Z http.https://github.com/.extraheader 2025-12-04T12:25:46.3404306Z Entering 'third_party/kineto' 2025-12-04T12:25:46.3444009Z http.https://github.com/.extraheader 2025-12-04T12:25:46.3479894Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T12:25:46.3517956Z http.https://github.com/.extraheader 2025-12-04T12:25:46.3556286Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T12:25:46.3595064Z http.https://github.com/.extraheader 2025-12-04T12:25:46.3636286Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T12:25:46.3673613Z http.https://github.com/.extraheader 2025-12-04T12:25:46.3708649Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T12:25:46.3748799Z http.https://github.com/.extraheader 2025-12-04T12:25:46.3794124Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T12:25:46.3831395Z http.https://github.com/.extraheader 2025-12-04T12:25:46.3867101Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T12:25:46.3906436Z http.https://github.com/.extraheader 2025-12-04T12:25:46.3944059Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T12:25:46.3982259Z http.https://github.com/.extraheader 2025-12-04T12:25:46.4018658Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T12:25:46.4057888Z http.https://github.com/.extraheader 2025-12-04T12:25:46.4096669Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T12:25:46.4137403Z http.https://github.com/.extraheader 2025-12-04T12:25:46.4179920Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T12:25:46.4217940Z http.https://github.com/.extraheader 2025-12-04T12:25:46.4255148Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T12:25:46.4293315Z http.https://github.com/.extraheader 2025-12-04T12:25:46.4329311Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T12:25:46.4369393Z http.https://github.com/.extraheader 2025-12-04T12:25:46.4410129Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T12:25:46.4449859Z http.https://github.com/.extraheader 2025-12-04T12:25:46.4491244Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T12:25:46.4538204Z http.https://github.com/.extraheader 2025-12-04T12:25:46.4575323Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T12:25:46.4613110Z http.https://github.com/.extraheader 2025-12-04T12:25:46.4650396Z Entering 'third_party/kleidiai' 2025-12-04T12:25:46.4690501Z http.https://github.com/.extraheader 2025-12-04T12:25:46.4725786Z Entering 'third_party/mimalloc' 2025-12-04T12:25:46.4767027Z http.https://github.com/.extraheader 2025-12-04T12:25:46.4803682Z Entering 'third_party/nlohmann' 2025-12-04T12:25:46.4844824Z http.https://github.com/.extraheader 2025-12-04T12:25:46.4882635Z Entering 'third_party/onnx' 2025-12-04T12:25:46.4922743Z http.https://github.com/.extraheader 2025-12-04T12:25:46.4976966Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T12:25:46.5015459Z http.https://github.com/.extraheader 2025-12-04T12:25:46.5056503Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T12:25:46.5096807Z http.https://github.com/.extraheader 2025-12-04T12:25:46.5138549Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T12:25:46.5177302Z http.https://github.com/.extraheader 2025-12-04T12:25:46.5213312Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T12:25:46.5251798Z http.https://github.com/.extraheader 2025-12-04T12:25:46.5291358Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T12:25:46.5328557Z http.https://github.com/.extraheader 2025-12-04T12:25:46.5364818Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T12:25:46.5402962Z http.https://github.com/.extraheader 2025-12-04T12:25:46.5440518Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T12:25:46.5480333Z http.https://github.com/.extraheader 2025-12-04T12:25:46.5516609Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T12:25:46.5555662Z http.https://github.com/.extraheader 2025-12-04T12:25:46.5600462Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T12:25:46.5640421Z http.https://github.com/.extraheader 2025-12-04T12:25:46.5674132Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T12:25:46.5712142Z http.https://github.com/.extraheader 2025-12-04T12:25:46.5751363Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T12:25:46.5787869Z http.https://github.com/.extraheader 2025-12-04T12:25:46.5836541Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T12:25:46.5873772Z http.https://github.com/.extraheader 2025-12-04T12:25:46.5938298Z Entering 'third_party/pocketfft' 2025-12-04T12:25:46.5977817Z http.https://github.com/.extraheader 2025-12-04T12:25:46.6018554Z Entering 'third_party/protobuf' 2025-12-04T12:25:46.6058139Z http.https://github.com/.extraheader 2025-12-04T12:25:46.6102235Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T12:25:46.6140731Z http.https://github.com/.extraheader 2025-12-04T12:25:46.6179316Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T12:25:46.6217879Z http.https://github.com/.extraheader 2025-12-04T12:25:46.6258632Z Entering 'third_party/psimd' 2025-12-04T12:25:46.6298013Z http.https://github.com/.extraheader 2025-12-04T12:25:46.6340256Z Entering 'third_party/pthreadpool' 2025-12-04T12:25:46.6380611Z http.https://github.com/.extraheader 2025-12-04T12:25:46.6419730Z Entering 'third_party/pybind11' 2025-12-04T12:25:46.6458400Z http.https://github.com/.extraheader 2025-12-04T12:25:46.6496064Z Entering 'third_party/python-peachpy' 2025-12-04T12:25:46.6535602Z http.https://github.com/.extraheader 2025-12-04T12:25:46.6572142Z Entering 'third_party/sleef' 2025-12-04T12:25:46.6611576Z http.https://github.com/.extraheader 2025-12-04T12:25:46.6647162Z Entering 'third_party/tensorpipe' 2025-12-04T12:25:46.6687518Z http.https://github.com/.extraheader 2025-12-04T12:25:46.6722406Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T12:25:46.6761550Z http.https://github.com/.extraheader 2025-12-04T12:25:46.6796756Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T12:25:46.6836363Z http.https://github.com/.extraheader 2025-12-04T12:25:46.6873700Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T12:25:46.6912798Z http.https://github.com/.extraheader 2025-12-04T12:25:46.6947292Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T12:25:46.6987208Z http.https://github.com/.extraheader 2025-12-04T12:25:46.7019810Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T12:25:46.7058110Z http.https://github.com/.extraheader 2025-12-04T12:25:46.7124858Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:46.7167938Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T12:25:46.7487456Z Entering 'android/libs/fbjni' 2025-12-04T12:25:46.7521309Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T12:25:46.7538396Z Entering 'third_party/FP16' 2025-12-04T12:25:46.7565459Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T12:25:46.7581029Z Entering 'third_party/FXdiv' 2025-12-04T12:25:46.7609855Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T12:25:46.7625385Z Entering 'third_party/NNPACK' 2025-12-04T12:25:46.7653397Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T12:25:46.7672067Z Entering 'third_party/NVTX' 2025-12-04T12:25:46.7698667Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T12:25:46.7716797Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T12:25:46.7745577Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T12:25:46.7766929Z Entering 'third_party/XNNPACK' 2025-12-04T12:25:46.7794111Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T12:25:46.7830168Z Entering 'third_party/aiter' 2025-12-04T12:25:46.7855865Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T12:25:46.7876084Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T12:25:46.7899440Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T12:25:46.7926235Z Entering 'third_party/benchmark' 2025-12-04T12:25:46.7954525Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T12:25:46.7973089Z Entering 'third_party/composable_kernel' 2025-12-04T12:25:46.8000327Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T12:25:46.8027072Z Entering 'third_party/cpp-httplib' 2025-12-04T12:25:46.8054390Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T12:25:46.8072932Z Entering 'third_party/cpuinfo' 2025-12-04T12:25:46.8098604Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T12:25:46.8118107Z Entering 'third_party/cudnn_frontend' 2025-12-04T12:25:46.8144528Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T12:25:46.8163618Z Entering 'third_party/cutlass' 2025-12-04T12:25:46.8191643Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T12:25:46.8218913Z Entering 'third_party/fbgemm' 2025-12-04T12:25:46.8246697Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T12:25:46.8265375Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T12:25:46.8293128Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T12:25:46.8312037Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T12:25:46.8338442Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T12:25:46.8365375Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T12:25:46.8391771Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T12:25:46.8410027Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T12:25:46.8437510Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T12:25:46.8464099Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T12:25:46.8491019Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T12:25:46.8506844Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T12:25:46.8534683Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T12:25:46.8553744Z Entering 'third_party/fbgemm/external/json' 2025-12-04T12:25:46.8578358Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T12:25:46.8600480Z Entering 'third_party/flash-attention' 2025-12-04T12:25:46.8626320Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T12:25:46.8645706Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T12:25:46.8672502Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T12:25:46.8695867Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T12:25:46.8722473Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T12:25:46.8751200Z Entering 'third_party/flatbuffers' 2025-12-04T12:25:46.8777904Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T12:25:46.8799434Z Entering 'third_party/fmt' 2025-12-04T12:25:46.8825263Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T12:25:46.8844934Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T12:25:46.8872075Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T12:25:46.8890141Z Entering 'third_party/gloo' 2025-12-04T12:25:46.8917919Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T12:25:46.8937949Z Entering 'third_party/googletest' 2025-12-04T12:25:46.8964648Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T12:25:46.8981053Z Entering 'third_party/ideep' 2025-12-04T12:25:46.9008715Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T12:25:46.9024040Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T12:25:46.9050836Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T12:25:46.9079078Z Entering 'third_party/ittapi' 2025-12-04T12:25:46.9103692Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T12:25:46.9122304Z Entering 'third_party/kineto' 2025-12-04T12:25:46.9152293Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T12:25:46.9170422Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T12:25:46.9197404Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T12:25:46.9214829Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T12:25:46.9242063Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T12:25:46.9259372Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T12:25:46.9286450Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T12:25:46.9301941Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T12:25:46.9328165Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T12:25:46.9346258Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T12:25:46.9373577Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T12:25:46.9390504Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T12:25:46.9415503Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T12:25:46.9436389Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T12:25:46.9460401Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T12:25:46.9479421Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T12:25:46.9503084Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T12:25:46.9522144Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T12:25:46.9547223Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T12:25:46.9567160Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T12:25:46.9594427Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T12:25:46.9611993Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T12:25:46.9639263Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T12:25:46.9656569Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T12:25:46.9683041Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T12:25:46.9700887Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T12:25:46.9726840Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T12:25:46.9746823Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T12:25:46.9773752Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T12:25:46.9791985Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T12:25:46.9818020Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T12:25:46.9837946Z Entering 'third_party/kleidiai' 2025-12-04T12:25:46.9863078Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T12:25:46.9882604Z Entering 'third_party/mimalloc' 2025-12-04T12:25:46.9907250Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T12:25:46.9925953Z Entering 'third_party/nlohmann' 2025-12-04T12:25:46.9954763Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T12:25:46.9975047Z Entering 'third_party/onnx' 2025-12-04T12:25:47.0002085Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T12:25:47.0040796Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T12:25:47.0064904Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T12:25:47.0086869Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T12:25:47.0115617Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T12:25:47.0136644Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T12:25:47.0162364Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T12:25:47.0178569Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T12:25:47.0204145Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T12:25:47.0219468Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T12:25:47.0246796Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T12:25:47.0261976Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T12:25:47.0290053Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T12:25:47.0305607Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T12:25:47.0331902Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T12:25:47.0347498Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T12:25:47.0375094Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T12:25:47.0394851Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T12:25:47.0418772Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T12:25:47.0436399Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T12:25:47.0459812Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T12:25:47.0480651Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T12:25:47.0505510Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T12:25:47.0524483Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T12:25:47.0552899Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T12:25:47.0592455Z Entering 'third_party/pocketfft' 2025-12-04T12:25:47.0617938Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T12:25:47.0636565Z Entering 'third_party/protobuf' 2025-12-04T12:25:47.0661309Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T12:25:47.0683673Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T12:25:47.0707259Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T12:25:47.0724168Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T12:25:47.0753050Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T12:25:47.0773214Z Entering 'third_party/psimd' 2025-12-04T12:25:47.0800573Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T12:25:47.0818491Z Entering 'third_party/pthreadpool' 2025-12-04T12:25:47.0847257Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T12:25:47.0862846Z Entering 'third_party/pybind11' 2025-12-04T12:25:47.0890660Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T12:25:47.0906985Z Entering 'third_party/python-peachpy' 2025-12-04T12:25:47.0937829Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T12:25:47.0955370Z Entering 'third_party/sleef' 2025-12-04T12:25:47.0979514Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T12:25:47.0998829Z Entering 'third_party/tensorpipe' 2025-12-04T12:25:47.1024525Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T12:25:47.1043251Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T12:25:47.1067413Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T12:25:47.1085597Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T12:25:47.1109262Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T12:25:47.1127080Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T12:25:47.1155256Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T12:25:47.1173209Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T12:25:47.1199181Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T12:25:47.1215215Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T12:25:47.1242904Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T12:25:47.1280470Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1309634Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1337888Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1363379Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1391293Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1417579Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1445094Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1474851Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1500683Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1525933Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1551890Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1578450Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1603127Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1629173Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1654414Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1679770Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1705049Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1730253Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1757080Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1781584Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1806719Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1832369Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1860135Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1882500Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1907572Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1932428Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1957795Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.1981957Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2007626Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2032946Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2059515Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2083545Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2109457Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2136966Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2162358Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2187961Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2212926Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2240730Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2266237Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2291796Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2318289Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2344229Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2369439Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2396109Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2421365Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2455695Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2482888Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2509371Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2539623Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2564499Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2591080Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2616262Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2642394Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2667304Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2692301Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2718170Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2743383Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2768156Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2794304Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2819142Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2844307Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2869362Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2894537Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2920352Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2946654Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2971690Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.2999500Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.3026037Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.3051242Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.3079566Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.3104864Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.3130116Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.3155840Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.3180472Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.3205075Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.3230675Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.3257573Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.3282860Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.3307859Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.3333530Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.3362157Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T12:25:47.3477199Z A job completed hook has been configured by the self-hosted runner administrator 2025-12-04T12:25:47.3492719Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh' 2025-12-04T12:25:47.3498589Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T12:25:47.3499043Z ##[endgroup] 2025-12-04T12:25:47.3585103Z [!ALERT!] Swap in detected! [!ALERT!] 2025-12-04T12:25:58.4578724Z [!ALERT!] Swap out detected [!ALERT!] 2025-12-04T12:26:17.0286486Z Cleaning up orphan processes